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SUMMARY  PAGE 


THE  PROBLEM 

To  evaluate  the  present  status  of  equipment  for  processing  helium-speech  and 
to  assess  expected  developments  toward  reliable  communication  by  voice  within 
hyperbaric  helium-oxygen  environments. 

FINDINGS 

A  workshop  was  conducted  on  helium-speech  processing,  attended  by  for¬ 
eign  scientists,  U.  S.  Navy  scientists,  operational  personnel,  Naval  and  inde¬ 
pendent  contractors,  and  speech  scientists  in  the  academic  world,  who  have  all 
been  active  in  underwater  communications.  Ten  papers  were  presented,  a  forum 
and  discussion  were  held,  and  a  summary  and  comments  were  presented.  It  was 
concluded  that  correction  of  hyperbaric  helium  speech  finally  can  be  accomplished. 
It  was  concluded  that  a  system  that  is  small,  inexpensive  and  reliable  must  be  de¬ 
signed  and  incorporated  into  diving  operations. 

APPLICATION 

Information  contained  in  this  report  is  useful  to  the  design  of  systems  intended 
to  improve  the  voice  communicability  of  divers  who  operate  within  hyperbaric 
helium  atmospheres. 
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ABSTRACT 


This  report  is  a  detailed  summary  of  the  proceedings  of  a 
workshop  held  during  August  1971  on  helium-speech  processing. 

The  meeting  was  jointly  sponsored  by  the  Office  of  Naval  Research 
and  the  Bureau  of  Medicine  and  Surgery.  It  was  held  at  the  Naval 
Submarine  Medical  Research  Laboratory  in  Groton,  Connecticut. 
Approximately  40  participants  were  brought  together,  including 
foreign  scientists,  U.  S.  Navy  scientists,  operational  personnel, 
Naval  and  independent  contractors,  and  speech  scientists  in  the 
academic  world,  all  who  have  been  active  in  underwater  communi¬ 
cations.  Formal  papers  were  presented  and  discussed,  a  forum 
and  discussion  period  was  held,  and  a  summary  and  comments  were 
presented.  Progress  and  future  developments  toward  reliable 
speech  communication  under  hyperbaric  helium-oxygen  conditions 
were  assessed.  Concepts  of  helium-speech  processing  were  ad¬ 
vanced  from  the  need  of  an  unscrambler  unit  to  one  which  includes 
the  under  standing  and  nature  of  the  constraints  allied  to  the  un¬ 
scrambler;  that  is,  talker,  listener,  face  mask  and  transducer. 

It  was  concluded  that  after  a  decade  of  research  the  ability  to  cor¬ 
rect  hyperbaric  helium  speech  finally  exists.  Now  a  system  that 
is  small,  inexpensive,  rugged  and  reliable  must  be  designed  and 
incorporated  into  diving  operations. 
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PREFACE 


Initial  recognition  of  the  need  for  a  workshop  on  processing 
helium  speech  evolved  from  discussions  among  personnel  from  the 
Bureau  of  Medicine  and  Surgery,  the  Office  of  Naval  Research, 
and  the  Naval  Submarine  Medical  Research  Laboratory.  From 
these  discussions,  it  was  concluded  that  an  extensive,  yet  frag¬ 
mented,  body  of  information  existed  in  the  area  of  helium  speech 
processing  and  underwater  voice  communication.  This  workshop, 
therefore,  was  undertaken  to  bring  together  interested  scientists 
and  military  personnel  in  order  to  unify  present  knowledge  and 
indicates  directions  for  future  research  and  development.  Dr. 
Charles  F.  Gell,  Scientific  Director  of  the  Submarine  Medical 
Research  Laboratory  and  Dr.  Gilbert  C.  Tolhurst,  former  head 
of  the  Physiological  Psychology  Branch  at  the  Office  of  Naval  Re¬ 
search,  inspired  the  initial  proposal  for  the  workshop,  and  there¬ 
after  provided  guidance  through  its  formative  stages  to  successful 
completion  of  the  program.  Selection  of  major  speakers  and 
participants  was  made  on  the  basis  of  their  particular  active  in¬ 
terests  in  underwater  voice  communication  with  special  emphasis 
on  how  to  process  helium  speech.  Thus,  the  scope  of  the  subject 
matter,  while  being  somewhat  restrictive,  nevertheless  extends  to 
all  phases  of  the  problem  of  underwater  voice  communication; 
history,  preliminary  approaches  to  the  problems  of  unscrambling 
speech,  theoretical  implications  and  solutions,  phonology,  systems 
design  and  operational  requirements. 

The  report  of  the  proceedings  which  follows  was  based  on  edited 
recordings  made  during  the  sessions  as  well  as  manuscripts  sub¬ 
mitted  by  each  of  the  major  contributors.  The  overall  character 
represents  a  single,  current  report  about  processing  helium  speech 
and  critical  references  thereto.  The  report  is  especially  timely  in 
that  it  occurs  at  a  major  turning  point  in  the  advancement  of  state- 
of-the-art  of  processing  the  speech  of  divers  and  swimmers.  As 
such,  it  is  indeed  a  valuable  source  for  scientists,  engineers  and 
the  operating  forces  of  the  U.  S.  Navy. 


J.  D.  Bloom,  CDR,  MC,  USN 
Offieer-in-C  harge 
Naval  Submarine  Medical  Research 
Laboratory 
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placed  in  the  mask  or  hard  hat.  The 
frequency  output  of  the  resonators 
suitably  shifted  downward  would  then 
be  summed  and  amplified.  The  space 
requirements  seemed  excessive. 

e.  Various  configurations  of  vocoder 
techniques  have  been  advanced  by  a  num¬ 
ber  of  investigators.  In  theory,  a  de¬ 
sign  which  would  utilize  formant  vocoder 
tracking  should  result  in  an  output  that 
would  not  only  be  intelligible  but  also 
one  that  would  preserve  more  of  the 
speaker's  individual  characteristics, 
i.e.,  sound  more  natural.  Until  two  or 
three  years  ago  such  a  system  would 


have  to  be  excessively  large,  but  mic¬ 
rominiaturization  techniques  now  in¬ 
clude  both  active  and  passive  filtering. 
This  methodology  may  be  the  next  logi¬ 
cal  design  step  that  should  be  attempted. 
There  probably  will  be  other  ideas, 
circuitry  and  procedures  arising  from 
the  contributions  of  the  participants  of 
this  workshop. 

If  there  are  any  schemes,  instru¬ 
ments,  or  developments  not  mentioned 
in  this  brief  history,  or  given  too  short 
or  cursory  description,  it  was  truly 
unintentional. 
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3.  INTELLIGIBILITY  AND  PERCEPTUAL  ASPECTS  OF  HELIUM  SPEECH 

by  Robert  F.  Coleman 


In  evaluating  a  communication  sys¬ 
tem,  the  entire  chain  between  a  talker 
and  listener  must  be  included,  or  er¬ 
roneous  estimates  of  performance  can 
be  obtained.  Thus,  testing  of  hardware 
is  fine,  but  represents  the  first  step  in 
an  overall  evaluation  of  the  system.  In 
addition,  a  significant  amount  of  re¬ 
search  now  available  indicates  that  the 
speaker,  particularly,  and  the  listener 
to  a  lesser  degree  are  responsible  for  a 
good  deal  of  the  poor  performance  of 
commercially  available  communication 
systems .  The  initial  coupling  of  the 
diver/talker  to  the  transducing  system, 
at  least  in  air  mixtures  using  conven¬ 
tional  microphones ,  results  in  approxi¬ 
mately  a  20%  degradation  of  the  mes¬ 
sage  to  be  transmitted.  Recent  devel¬ 
opments  by  Morrow ,  et  al  32>33  appear 
to  indicate  that  this  coupling  effect  can 
be  compensated  for  by  microphone  de¬ 
sign  or  other  system  changes . 

"Intelligibility"  refers  primarily  to  a 
property  of  speech  communication  in¬ 
volving  meaning,  rather  than  simple 
recognition  of  specific  speech  sounds . 
Since  "meaning"  is  a  wastebasket  term 
for  human  processing  of  speech,  it  is 
logical  that  there  are  many  different 
types  of  meaning  which  are  not  neces¬ 
sarily  equivalent.  The  upshot  of  this 
rather  academic  discussion  is  that  we 
must  be  very  cautious  about  assuming 
that  because  a  communicator  achieves  a 
score  of  90%  on  a  monosyllabic  word 
test,  that  it  will  be  "90%  intelligible"  in 
all  situations.  Such  is  not  the  case. 


A  major  point  I  wish  to  make  is  that 
intelligibility  scores  are  relative  rather 
than  absolute;  the  further  you  extrapo¬ 
late  from  the  test  material  used,  the 
greater  the  likelihood  that  you  will  make 
an  error  in  your  evaluation  of  a  partic¬ 
ular  system.  Thus,  the  most  logical 
solution  to  measurement  would  be  to 
test  all  communicators  under  consider¬ 
ation  under  the  same  conditions,  and 
using  the  same  material.  In  this  way, 
a  relative  ranking  of  the  systems  can  be 
obtained.  It  cannot  logically  be  said, 
however,  that  one  communicator  is  "15% 
better"  than  another,  because  the  re¬ 
sults  are  simply  ranked  data,  not  abso¬ 
lute  in  terms  of  percent  intelligibility, 
etc. 

Space  does  not  permit  an  exhaustive 
discussion  of  factors  involved  in  obtain¬ 
ing  intelligibility  scores;  however,  we 
can  list  a  few  of  them  such  as  word 
familiarity ,  syllable  length  and  com¬ 
plexity,  word  use  frequency,  redundan¬ 
cy  in  sentences,  talker  and  listener 
differences ,  and  closed  vs .  open  re¬ 
sponse  sets. 

23 

Attempts  have  been  made  by  nwself 
and  several  other  people  24,35, 45, 4o  to 
sort  out  the  specific  effects  of  air  and 
HeC>2  mixtures  on  intelligibility.  Most 
of  us  have  used  closed  response  sets, 
or  gone  to  an  exhaustive  matrix  assum¬ 
ing  any  response  was  allowable.  The 
results  of  all  the  studies  put  together, 
frankly,  are  rather  non-productive. 
There  appears  to  be  a  rather  strong 


trend  (for  both  air  and  HeC>2  mixtures) 
for  the  manner  of  articulation  of  a  par¬ 
ticular  phoneme  to  be  preserved,  with 
indications  that  the  place  aspect  of 
phonemes  is  distorted.  The  voicing 
feature  appears  to  be  relatively  stable. 
In  work  with  normal  air  mixtures ,  a 
tendency  for  medial  phonemes  to  be¬ 
come  peripheral  was  noted ,  although  at 
depths  below  200'  the  changing  acoustic 
coupling  and  transmission  losses  through 
the  human  pharyngeal  wall  appear  to 
wipe  out  this  effect;  as  a  matter  of  fact, 
the  tendency  is  for  peripheral  phonemes 
to  be  heard  as  medial,  such  as  /l/  for 
/b/,  /t /  for  /k/,  etc. 

Specific  intelligibility  scores  for  var¬ 
ious  communicators  are  available  in  the 
literature;  23,24  from  the  success  or  the 
lack  of  success  we  have  had  with  at¬ 
tempting  to  establish  some  degree  of 
order  with  respect  to  intelligibility  test¬ 
ing,  several  conclusions  appear  to  be 
safe  to  state. 

First,  it  is  not  likely  that  any  com¬ 
munication  lexicon  based  on  phonemic 
restriction  will  be  successful— relation¬ 
ships  between  phoneme  classes  appear 
to  change  as  a  function  of  depth  and 
mixture. 

Second,  it  is  desirable  to  test  com¬ 
munication  devices  with  groups  of  naive, 


but  live,  listeners,  using  a  closed  set  of 
test  sentences  or  words. 

Third,  great  caution  must  be  taken  in 
using  multiple  forms  of  the  same  test , 
under  the  assumption  that  "equivalent 
word  lists"  are  in  fact  equivalent  at 
depth.  (Closed  sets  of  words  and  sen¬ 
tences  would  greatly  reduce  this  prob¬ 
lem.) 

Fourth,  in  actual  operations,  a  re¬ 
stricted  lexicon  is  needed,  to  increase 
the  probability  of  a  particular  word  oc¬ 
curring. 

Fifth,  a  lexicon  which  includes  cer¬ 
tain  types  of  words  in  specific  positions 
within  a  transmission  is  needed. 

Finally,  the  most  obvious  single  need 
in  evaluating  communicators  is  to  stand¬ 
ardize  presentation,  conditions,  depths, 
etc. ,  in  order  that  intelligent  decisions 
can  be  made  concerning  the  relative 
merits  of  the  systems  under  considera¬ 
tion.  Expensive  and  time  consuming 
live  talker/listener  tasks  in  a  standard 
setting  come  as  close  to  operational 
evaluation  as  any  system  I  can  think  of, 
and  are  infinitely  better  than  submerging 
a  group  of  divers  with  different  systems 
on,  and  having  each  one  count  to  ten. 
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4.  REQUIREMENTS  OF  THE  IDEAL  HELIUM  SPEECH 
COMMUNICATIONS  SYSTEM 


by  J.  H.  Elkins 


INTRODUCTION 

Diver  communications  have  been 
needed  since  man  first  ventured  beneath 
the  sea.  The  early  diver  was  dependent 
upon  hand -line  signals  from  the  surface 
and  for  many  years  this  method  was  the 
only  means  available.  So  work  on  a 
limited  piecemeal  basis  was  started  on 
underwater  communications.  But,  be¬ 
fore  we  really  ever  understood  the  prob¬ 
lem  and  certainly  before  adequate 
equipment  was  developed,  we  were 
face-to-face  with  another  problem.  The 
breathing  of  exotic  gas  by  deep  divers 
offers  unique  problems  in  voice  com¬ 
munications.  Consequently,  we  began 
solving  the  second  problem  without  sol¬ 
ving  the  first.  At  this  point  I  would  like 
to  back  up  and  review  the  progress  in 
underwater  communications  over  the 
years . 

The  communications  problem  for  the 
hard-hat  diver  was  solved  to  a  very 
limited  degree  in  1925;  however,  the 
advent  of  self-contained  underwater 
breathing  apparatus  (SCUBA)  brought 
with  it  a  completely  new  set  of  prob¬ 
lems.  The  diver  was  no  longer  speaking 
in  an  air  environment  (helmet),  there 
was  no  tether  to  the  surface,  and  to  fur¬ 
ther  compound  the  problem,  the  diver 
had  a  mouthpiece  which  filled  his  mouth 
and  virtually  precluded  voice  communi¬ 
cations. 

Various  attempts  were  made  to  pro¬ 
vide  voice  communications  by  the  use  of 


throat  microphones,  bone  conduction 
pickups,  etc. ,  but  in  the  final  analysis 
they  all  provided  little  improvement 
over  no  communications  at  all  and  it  was 
finally  recognized  that  the  diver  could 
not  talk  with  the  mouthbit  in  place .  This 
factor  led  to  the  design  of  a  full  face- 
mask. 

The  full  facemask,  and  later  the 
oral-nasal  facemask,  was  an  important 
step  in  the  right  direction.  The  first 
full  facemasks  improved  communica¬ 
tions  but  also  generated  physiological 
problems  in  the  form  of  carbon  dioxide 
poisoning  due  to  the  large  ’’dead  air 
space"  within  the  mask.  The  oral-nasal 
mask  was  designed  to  reduce  this  dead 
air  space,  and  accomplished  this  ob¬ 
jective  to  some  extent. 

The  communications  equipment  for 
the  oral-nasal  mask  (AN/PQC-1)  was 
designed  along  the  lines  of  surface  com¬ 
munications  equipment  and  became  the 
first  piece  of  underwater  communica¬ 
tions  equipment  designed  for  use  by  the 
U.  S.  Navy.  This  equipment  was  a 
masterpiece  of  packaging  for  this  era 
(1957)  but  provided  an  intelligibility  of 
less  than  14  percent.  There  were  three 
factors  contributing  to  this  low  intelligi¬ 
bility.  The  mask  used  was  the  first  full 
facemask  and  had  a  poor  acoustic  re¬ 
sponse.  The  bandwidth  was  even  less 
than  used  in  the  common  telephone 
(which  is  insufficient)  and  the  micro¬ 
phone  was  a  carbon  button  microphone 
very  similar  to  that  used  with  World 
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War  II  radio  transmitters,  and  noted  for 
its  distortion.  These  factors  added  up 
to  a  poor  communication  system. 

The  second  underwater  communica¬ 
tions  equipment  for  fleet  use  (1962-64) 
was  the  AN/PQC-1A.  This  equipment 
was  designed  in  the  early  days  of  tran¬ 
sistors  and  was  essentially  a  transistor¬ 
ized  version  of  the  AN/PQC-1.  The 
same  microphone  was  used,  the  same 
bandwidth,  a  slightly  improved  mask, 
physiologically,  but  since  transistor 
technology  had  not  advanced  sufficiently, 
this  equipment  proved  to  be  little  better 
than  its  predecessor. 

Since  the  unsuccessful  AN/PQC-1A, 
there  have  been  a  number  of  commercial 
items  which  have  shown  promise . 

Aquasonics  (now  Hydro  Products)  was 
responsible  for.  a  series  of  equipments 
operating  on  a  frequency  of  42  kHZ, 
amplitude  modulated,  which  provided 
intelligibilities  of  approximately  70  per¬ 
cent.  This  equipment  was  limited  in 
range  (300  yards)  and  it  made  use  of  the 
NAUTILUS  mask  which  is  physiological¬ 
ly  unacceptable.  The  reliability  of  this 
equipment  was  poor  due  to  excess  cabling 
from  the  diver's  belt  to  mask  and  hood, 
which  was  subject  to  failure. 

Since  these  developments,  many  at¬ 
tempts  have  been  made  to  solve  the 
communications  problems  of  the  diver; 
but  few  have  had  the  background,  exper¬ 
ience,  and  intimate  knowledge  of  the 
environment  and  technical  know-how 
necessary  to  succeed. 

One  of  the  common  mistakes  is  for  a 
designer  to  conceive  a  new  method  of 
transmitting  the  signal  through  water, 


design  the  equipment,  and  then  begin  to 
look  for  a  microphone  and  earphone  with 
which  to  use  the  new  equipment.  A 
search  reveals  no  microphone  or  ear¬ 
phones  specifically  designed  for  use 
under  water,  so  the  designer  resorts  to 
one  of  two  possible  approaches.  A  con¬ 
ventional  air  microphone  will  be  used 
but  "modified"  for  use  under  water.  The 
modification  will  consist  of  a  waterproof 
covering,  which  destroys  the  sensitivity 
and  frequency  response  so  vital  to  suc¬ 
cessful  underwater  communications. 

The  other  approach,  equally  unsuccess¬ 
ful,  entails  the  decision  to  develop  a 
microphone  for  use  with  the  new  equip¬ 
ment.  The  bandwidth  chosen  will  be 
compared  with  what  is  required  on  tele¬ 
phone  circuits  and  will  not  be  sufficient 
for  a  diver  at  depth.  If  the  designer 
happens  to  be  correct  in  bandwidth  se¬ 
lection  ,  then  the  pressure  to  which  the 
microphone  or  earphone  is  subjected, 
will  either  cause  the  microphone/ear¬ 
phone  to  become  insensitive,  change  the 
frequency  response ,  or  become  inoper¬ 
ative  . 

Some  equipment  that  has  been  devel¬ 
oped  on  the  commercial  market  has 
achieved  some  degree  of  success  as 
mentioned  earlier  and  hard-wire  com¬ 
munications  developed  for  Sealab  II  and 
III  by  NCSL  have  achieved  intelligibili¬ 
ties  of  80  percent  (at  shallow  depths) 
and  could  be  adapted  for  other  tethered 
applications. 

Now ,  back  to  the  more  recent  prob¬ 
lem  of  helium  speech.  To  avoid  nitrogen 
narcosis,  divers  breathe  helium-oxygen 
at  depths  of  200  feet  or  more,  with  a 
resultant  upshift  in  voice  frequencies 
to  the  detriment  of  intelligibility. 
Down-shifting  the  frequencies  has  been 
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attempted  by  means  of  various  unscram¬ 
blers  with  varied  success. 

High-voiced  frequencies  when  breath¬ 
ing  helium-oxygen  result  because  for¬ 
mant  frequencies  of  helium  speech  are 
related  to  those  of  normal  speech  in  air 
by  the  ratio  of  sound  velocities  in  the 
two  mediums.  The  velocity  of  sound  in 
dry  air  is  about  331  M/SEC  and  in  heli¬ 
um  is  about  1284  M/SEC. 

Diving  to  depths  of  600  feet  is  routine 
today  and  we  have  every  reason  to  ex¬ 
pect  that  depths  of  1000  feet  and  beyond 
will  become  routine  within  a  few  years. 
Based  on  these  projections  then  the  hel¬ 
ium  speech  problem  is  likely  to  become 
even  more  acute  in  the  near  future. 

Some  helium  speech  unscramblers 
tested  by  the  University  of  Florida  in 
May  1970 24 attained  scores  of  only  45 
percent.  This  emphasizes  the  need  for 
more  work  in  this  area.  There  are  var¬ 
ious  theories  which  attempt  to  account 
for  the  somewhat  limited  success  in 
helium  speech  processors.  In  my  opin¬ 
ion,  the  largest  contributor  is  the  lack 
of  knowledge  pertaining  to  the  micro¬ 
phone  requirements  with  which  to  feed 
the  unscramblers.  One  early  unscram¬ 
bler  had  circuitry  incorporated  which 
actually  limited  the  passband  to  an  upper 
limit  of  5Q00  HZ.  The  natural  tendency 
is  toward  a  flat  microphone  regardless 
of  the  bandwidth  but  inside  a  facemask  a 
flat  response  is  not  what  is  required  but 
rather  a  12  DB/octave  upslope  on  the 
high  end. 

REQUIREMENTS 

Diver.  One  requirement  concerning 
underwater  communications,  which  is 


totally  unrelated  to  equipment,  but  which 
is  just  as  vital,  is  the  qualifications  of 
the  diver  who  wears  the  equipment. 

Diver  experience  is  a  large  contributor 
to  successful  communications.  Some  of 
the  new  masks  are  frightening  to  the  in¬ 
experienced  who  find  it  difficult  to  make 
themselves  understood  wearing  these 
masks  regardless  of  the  equipment  under 
test. 

Input.  Electronically  speaking,  one 
of  the  first  requirements  for  helium 
speech  processors  is  a  good  quality 
’’front  end,"  that  is,  the  electronics 
which  precede  the  unscrambler  and  de¬ 
termines  the  quality  of  the  input  data. 
Many  of  the  early  tape  recordings  of 
helium  speech  which  were  used  as  a 
basis  for  equipment  design,  were  of  no¬ 
toriously  poor  quality.  They  were  made 
with  poor  microphones  or  microphones 
of  unknown  characteristics  connected  to 
unshielded  cables  and  recorded  on  re¬ 
corders  of  questionable  quality.  So  we 
might  say  that  the  early  developer  was 
doomed  to  failure  before  he  began.  To¬ 
day,  however,  the  situation  is  very 
much  improved  as  you  will  learn  from 
Dr.  Morrow's  talk  which  follows  my 
own. 

Bandwidth  is  a  vital  parameter  that 
is  often  overlooked.  We  are  no  longer 
speaking  in  the  range  200-3500  HZ  as 
with  the  telephone,  but  rather  more  like 
300-10  KHZ.  Actual  sonograms  of  vocal 
data  taken  at  a  depth  of  460  feet  makes  it 
obvious  that  an  upper  limit  of  at  least  10 
KHZ  is  required.  It  is  of  interest  that 
the  word  ’’team”  was  reconstructed  by 
use  of  an  unscrambler  indicating  that 
there  is  still  progress  to  be  made. 

A  good  quality  mask  and  a  good  qual¬ 
ity  microphone  will  not  necessarily . 
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make  a  good  communication  system  un¬ 
less  they  are  compatible.  For  example, 
a  Bruel  and  Kjaer  standard  microphone 
frequency  response  as  it  is  in  a  simu¬ 
lated  free  field  is  very  flat  but  when  put 
inside  a  facemask,  it  deteriorates 
drastically.  The  mask  and  microphone 
must  therefore  be  considered  together 
and  not  individually.  Some  microphones 
deteriorate  when  subjected  to  the  deeper 
depths.  Still  others  show  different 
characteristics  in  different  masks  and 
marked  differences  are  noted  between 
helium  and  air.  These  differences  are, 
however,  minimized  in  the  case  of  the 
helmet. 

Tests  to  date  indicate  the  microphone 
developed  by  Dr.  Morrow  {LTV  micro- 
phone)  °  ’  compared  to  two  other  types 
is  superior  and  that  we  can  now  start  out 
toward  a  complete  unscrambling  unit 
with  a  quality  microphone. 

Conversion.  Real  time  conversion 
is  mandatory  as  a  practical  means  of 
communications .  That  is ,  using  data 
taken  with  the  LTV  microphone , 32 
reasonably  intelligible  speech  can  be 
obtained  by  simply  slowing  down  the 
tape  recorder.  This  is  an  impractical 
method  of  speech  conversion  but  it  does 
prove,  however,  that  all  of  the  impor¬ 
tant  frequencies  are  detected  by  the 
microphone. 

Output.  Hearing  thresholds  vary  a 
great  deal,  especially  among  divers , 
which  dictates  that  the  audio  levels 
should  be  somewhat  higher  than  normal 
and  adjustable. 

Size.  The  unscrambler  should  be 
small  and  compact  such  that  the  diver 


can  carry  it  and  be  provided  with  a  cor¬ 
rected  version  of  his  own  voice  which 
has  been  proven  to  increase  intelligibil¬ 
ity  due  to  diver  adaptability. 

Intelligibility.  There  are  as  many 
means  of  measuring  intelligibility  as 
there  are  people  in  this  room.  Intelli¬ 
gibility  scores  alone  are  meaningless 
unless  accompanied  by  all  the  conditions 
under  which  the  scores  were  obtained. 

I  will  not  attempt  to  discuss  all  of  the 
various  methods  of  intelligibility  meas¬ 
urement  but  will  only  describe  the  way 
we  make  these  measurements  at  the 
Naval  Coastal  Systems  Laboratory  in 
Panama  City,  Florida.  We  have  recent¬ 
ly  gone  to  the  modified  rhyme  test 
(MRT) .  This  method  simplifies  the  tak¬ 
ing  of  underwater  data  in  that  the  diver 
does  not  write  what  he  hears  but  chooses 
one  of  the  six  possible  words  on  his  list. 
These  lists  were  statistically  designed 
and  computer  randomized  for  our  ongo¬ 
ing  effort  in  underwater  communications 
under  the  sponsorship  of  the  Naval  Ship 
Systems  Command  and  others.  We  have 
been  using  this  method,  or  phonetically 
balanced  word  lists ,  for  the  past  eight  ... 
years.  At  this  point,  the  MRT  method 
appears  to  be  more  practical. 

Controls.  The  controls  on  helium 
speech  processors  should  be  minimized. 
Often  the  diver  has  his  hands  full  just 
staying  alive.  An  on-off  switch  and 
volume  control  should  be  all  that  is 
necessary.  Any  adjustments  to  com¬ 
pensate  for  different  depths  should  be 
made  either  prior  to  the  dive  or  auto¬ 
matically  by  pressure  sensitive  devices. 

Power.  The  power  required  should 
obviously  be  minimal  since  the  diver  is 
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carrying  the  speech  processor.  It 
should  be  possible  through  today's  tech¬ 
nology  to  combine  the  speech  processor 
with  other  communications  equipment 
and  in  fact,  this  is  the  approach  that  we 
at  the  Naval  Coastal  Systems  Laboratory 
in  Panama  City  are  taking  in  our  ongoing 
effort  in  communication . 


When  this  system  is  completed 
about  a  year  from  now,  we  hope 
to  provide  the  military  diver  with 
a  modular  communications  system 
whether  he  is  tethered  or  untethered 
saturated  or  unsaturated,  at  any 
depth. 
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5.  THE  INITIAL  SPEECH  TRANSDUCER  AND  ITS  ENVIRONMENT* 

by  C.  T.  Morrow 


The  majority  of  microphones  that 
have  been  used  in  deep  dives  are  stand¬ 
ard  communication  microphones  with  a 
response  to  only  about  3  or  4  KiloHertz 
(kHz) ,  which  is  not  improved  by  a  deep 
submergence  atmosphere.  When  typical 
"high  fidelity"  microphones  are  used, 
their  high  frequency  response  is  de¬ 
graded  by  the  deep  submergence  atmos¬ 
phere.  Taped  helium  speech  recordings 
at  about  400  feet  supplied  by  Captain 
George  Bond  and  by  Harry  Hollien  of  the 
University  of  Florida  proved  to  be  unin¬ 
telligible  to  the  novice  listener.  Sono¬ 
grams  showed  no  energy  above  3  kHz. 

If  one  considers  the  upward  shift  of  the 
resonance  regions  that  occurs  in  helium 
speech,  the  feasibility  of  using  a  stand¬ 
ard  microphone  becomes  even  more 
discouraging. 

The  purpose  of  our  program  was  to 
develop  improved  speech  communication 
in  diving  masks  used  in  shallow  and  deep 
submergence  atmospheres.  As  part  of 
the  program,  construction  of  an  experi¬ 
mental  microphone  with  a  response  to  10 
kHz  was  undertaken.  The  experimental 
models  were  conceived  with  the  aid  of 
A.  J.  Brouns.  The  initial  gradient 
microphone  contained  a  curvilinear 
aluminum  diaphragm  coupled  at  its 
center  to  a  bimorph  bar  as  in  Figure  1. 
Later  revisions  have  utilized  a  plastic 
dome  coupled  to  a  Bimorph  ring.  By 
compromising  a  certain  amount  of 
sensitivity,  the  mechanical  resonance 


*References  32  and  33  treat  this  subject  more  com¬ 
prehensively . 


can  be  set  near  20  kHz,  which  is  out  of 
range  for  speech  even  in  helium  atmos¬ 
pheres.  By  keeping  the  back  of  the  mi¬ 
crophone  open,  it  becomes  a  gradient 
microphone.  It  is  unlike  most  micro¬ 
phones  which  have  a  diaphragm  reson¬ 
ance  in  the  speech  frequency  range  and, 
consequently,  have  two  sets  of  parame¬ 
ters  which  affect  the  response.  The 
first  is  the  mechanical  properties  which 
don't  change  with  depth;  the  second  is 
the  properties  of  the  cavity  which  do 
change  with  depth.  Thus,  even  if  a 
microphone  is  "high  fidelity"  when  you 
start  with  it,  it  is  not  when  you  get  down 
to  depth. 

Upon  completion  of  the  construction 
of  the  experimental  microphone,  the 
next  step  was  to  measure  mask  cavity 
acoustics  and  response  of  the  microphone 
in  simulated  mask  cavities.  To  do  this, 
a  model  cavity  having  a  constant  diame¬ 
ter  but  variable  volume  was  built  using 
plastic  rings  and  an  endcap  which  could 
be  coupled  on  to  a  basic  mouthpiece, 
and  also  a  large  rectangular  plexiglass 
cavity . 

An  acoustic  impedance  calibrator 
was  constructed  using  a  Bruel  andKjaer 
(B&K)  artificial  voice,  a  Sierra  Engin¬ 
eering  50th-percentile  male  anthropo¬ 
morphic  head,  a  sound  transmission 
tube  with  a  small  amount  of  lamb's  wool 
to  decrease  standing  waves,  and  a 
sintered  bronze  disk  (to  provide  a  high 
acoustic  source  impedance  at  the  lips) 
just  under  3/4  inches  in  diameter.  A 
l/4-inch  B&K  condenser  microphone 
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Fig.  1.  ( Morrow )  The  initial  experimental  microphone. 


was  mounted  in  the  tube ,  just  back  of  the 
disk,  and  was  connected  in  an  automatic 
gain  control  circuit  to  maintain  constant 
sound  pressure  on  the  inside  surface  of 
the  disk.  A  second  l/4  inch  microphone 
was  used  to  measure  the  sound  pressure 
immediately  in  front  of  the  disk.  The 
masks  and  cavities  tested  were  remotely 
positional  on  and  off  the  head  by  a  motor 
and  lead  screw  in  the  base  -  a  conven¬ 
ience  when  tests  were  to  be  performed 
in  a  helium  pressure  chamber.  Pres¬ 
sure  response  curves  for  typical  exper¬ 
imental  cavities,  as  measured  on  the 
calibrator  to  indicate  their  acoustic 
impedance  relative  to  open  space,  were 
obtained.  According  to  our  initial 
thoughts,  the  17 .4  cubic  inch  cavity 
would  be  close  to  an  optimum  for  a  mask 
without  absorption,  and  preliminary 
listening  tests  with  human  speech 


appeared  to  confirm  this .  Except  in  the 
case  of  the  63  cubic  inch  rectangular 
plexiglas  cavity,  which  had  resonances 
in  its  front  and  rear  walls ,  pressuriza¬ 
tion  with  helium  shifted  the  curves  up¬ 
ward  in  frequency  essentially  in  accord¬ 
ance  with  the  change  in  speed  of  sound. 
Sound  absorption  was  effective  in  elimi¬ 
nating  standing  waves ,  but  this  was 
found  to  be  unnecessary  for  intelligibil¬ 
ity  when  the  gradient  microphone  was 
used  close  to  the  lips. 

Measurement  of  transmission  to  the 
outside  as  well  as  measurement  of 
acoustic  impedance  was  undertaken  in 
order  to  understand  the  combined  effects 
of  cavity  and  transducer  more  fully.  In 
the  case  of  diving  masks  and  other  closed 
cavities ,  there  is  no  useful,  radiation  to 
the  outside,  and  communication  becomes 


19 


dependent  on  a  microphone  that  is  not 
limited  to  instrumentation  types.  The 
relative  transmission  of  a  given  micro¬ 
phone  in  a  given  location  in  a  given 
cavity  was  obtained  from  its  response  to 
the  calibrator ,  with  and  without  the 
cavity.  The  response  of  the  gradient 
microphone  at  the  lips  appeared  simpler 
and  more  attractive  than  those  of  the 
pressure  microphone,  with  no  need  for 
equalization  for  bass  boost  from  the 
mask  cavity  and  with  promise  for  some¬ 
what  better  intelligibility.  When  close 
to  the  lips,  the  pressure  microphone 
showed  a  transmission  curve  similar  to 
the  impedance  curve  for  the  same  cav¬ 
ity,  as  might  be  expected.  The  shape 
was,  however,  obviously  different  for 
other  pressure  microphone  locations  in 
the  cavity. 

Some  recordings  of  a  diver's  speech 
were  made  at  simulated  depths  of  650 
and  460  feet  at  the  Experimental  Diving 
Unit  in  Washington,  D.C.  A  ceramic 
gradient  microphone  with  its  preampli¬ 
fier  was  used.  The  preamplifier  had  a 
6  dB/octave  high  frequency  preemphasis 
above  about  1.5  kHz  as  a  partial  com¬ 
pensation  for  decreased  intensity  of  high 
frequency  harmonics  in  the  voice  in  deep 
submergence.  The  microphone  dia¬ 
phragm  resonance  was  at  about  20  kHz, 
requiring  no  acoustic  compensation 
which  would  degrade  the  deep  submer¬ 
gence  performance. 

Using  a  "high  fidelity"1  headphone,  the 
intelligibility  of  helium  speech  directly 
from  the  microphone,  or  from  the  play¬ 
back  head  of  the  tape  recorder,  was 
disappointing.  It  was  better  than  the 
recordings  supplied  by  Bond  and  Hollien, 
but  not  by  very  much.  On  playing  the 
helium  speech  back  at  one-half  of  the 


recording  speed,  the  articulation  im¬ 
proved  to  an  estimated  90  percent  or 
better.  In  consideration  of  the  generally 
poor  reports  on  this  method  of  transla¬ 
tion,  and  the  rather  poor  intelligibility 
of  the  untranslated  signal  from  the  new 
microphone,  the  effect  was  startling.  In 
contrast  to  the  half- speed  playback, 
playing  the  tape  at  normal  speed  through 
an  "unscrambler"  of  rather  early  design 
did  not  provide  intelligible  speech  for 
any  adjustment  of  the  unscrambler. 
Within  the  reproducibility  of  the  human 
voice ,  it  made  no  difference  whether  the 
microphones  were  held  in  front  of  the 
lips  by  a  boom  on  a  headset  or  by  a 
mount  in  the  Kirby  Morgan  Clamshell 
Helmet.  (Of  course,  if  the  helmet  had 
been  submerged,  there  would  have  been 
bubble  noise.  However,  the  gradient 
microphone  would  have  provided  some 
discrimination  against  this.)  Likewise, 
a  change  in  depth  from  650  feet  and 
97.5%  helium  to  460  feet  and  98%  helium 
had  no  effect. 

These  observations  were  confirmed 
by  sonograms.  The  helium  speech  was 
played  back  at  half-speed.  The  shapes 
of  the  formant  frequency  contours  were 
very  similar  to  those  for  sea-level  air, 
for  the  same  words  spoken,  but  they 
were  about  40%  higher  in  frequency  for 
the  half-speed  playback.  They  were  also 
more  sharply  tuned.  There  is  no  obvi¬ 
ous  nonlinearity  in  the  formant  shifts , 
suggesting  that  Fant's  estimate  of  150  to 
200  Hz  for  closed-lip  resonance  may  be 
a  little  high. 

Based  on  our  information  gathered 
during  this  program,  there  are  sev¬ 
eral  principles  which  could  be  or  al¬ 
ready  have  been  applied  to  helium 
speech  unscramblers . 
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Simply  playing  back  the  tape  record¬ 
ings  at  half-speed  serves  very  well  for 
helium  speech  to  650  feet  but  this  can¬ 
not  be  done  in  real  time.  A  tape  head 
built  into  a  spinning  wheel,  almost 
surrounding  by  a  moving  tape  has  been 
used  successfully  to  stretch  or  com¬ 
press  the  time  scale  of  speech  without 
changing  its  quality.  The  device  simply 
omits  brief  intervals  on  the  tape  or  re¬ 
produces  them  more  than  once.  The 
same  principle  has  been  applied  to 
shifting  frequency  without  changing  the 
timescale.  The  most  successful  em¬ 
bodiments  have  incorporated  small 
electronic  computers  to  shift  the  fre¬ 
quencies. 

A  second  method  is  to  divide  the 
spectrum  into  frequency  bands  and 
heterodyne  each  downward.  Heterodyn¬ 
ing  a  broad  band  moves  first  formants 
downward  more  rapidly  than  second 
formants  and  might  tend  to  compensate 
for  pressure  effects  in  nitrogen-oxygen 
or  neon-oxygen  atmospheres.  For  ac¬ 
curate  compensation  of  an  approximately 
linear  shift  by  helium ,  a  large  number 
of  bands  would  appear  to  be  necessary. 
Translators  available  now  are  generally 
limited  to  two  or  three  bands . 

A  third  method  is  to  use  an  analyzer 
and  synthesizer  with  a  frequency  trans¬ 
lation  incorporated  between  the  two.  If 
desired,  the  fundamental  frequency  can 


be  left  unchanged  while  the  envelope  of 
the  harmonics  is  shifted. 

It  is  generally  reported  that  unscram¬ 
blers  must  be  carefully  adjusted  for  each 
voice.  The  results  of  our  investigations 
indicate  that  there  should  be  no  need  for 
such  adjustment  if  the  design  is  adequate 
to  preserve  or  generate  a  satisfactory 
relative  frequency  relationship  between 
formants.  Further,  if  the  percentage  of 
helium  is  in  the  nineties ,  adjustments 
shouLd  not  be  necessary  for  depths. 

With  helium  speech,  the  amount  of  fre¬ 
quency  division  should  not  be  critical. 
The  tape  experiments  which  we  have 
done,  plus  other  experiments  in  the 
shifting  of  playback  speed  make  it  evi¬ 
dent  that  halving  the  frequencies  would 
be  satisfactory  for  normal  ears .  For 
divers  with  severe  high-frequency  hear¬ 
ing  losses,  it  may  be  preferable  to  di¬ 
vide  by  three  or  more . 

In  conclusion,  a  type  of  translator  is 
necessary  for  understanding  speech  pro¬ 
duced  in  deep  submergence  helium  at¬ 
mospheres.  In  combination  with  this 
translator,  an  adequate  microphone  such 
as  the  one  used  in  this  investigation  is 
necessary.  The  microphone  discussed 
in  this  presentation  performs  as  a  gradi¬ 
ent  microphone  to  10  kHz  or  better  in 
helium.  It  is  insensitive  to  mask  cavity 
acoustics  and  it  withstands  pressuriza¬ 
tion,  decompression  and  the  marine  en¬ 
vironment. 
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sponding  resonance  is  shifted  to  610  Hz, 
as  indicated  by  the  vertical  dotted  line. 
In  a  similar  way  it  is  easy  to  find  how 


was  easier  to  understand  as  it  increases 
the  velocity  of  sound  and  accordingly 

shifts  un  the  resonance  fronimnnioij  /-.f 


6.  PRESENT  AND  FUTURE  WORK  IN  UNDERWATER  COMMUNICATIONS 
AT  THE  SPEECH  TRANSMISSION  LABORATORY  IN  STOCKHOLM 


by  J.  Lindqvist 


Our  interest  in  divers'  speech  at  the 
Speech  Transmission  Laboratory 
started  in  1962  when  the  physiologist 
Bertil  Sonesson  from  Lund  demon¬ 
strated  the  nasal  quality  of  speech  re¬ 
corded  at  11  ata  (300  feet).  At  that 
time  a  physiological  explanation  to  the 


resonance  frequency  is  therefore  lim¬ 
ited  by  the  shunting  effect  of  the  walls. 
When  a  vowel  sound  is  produced  the 
participation  of  the  vocal-tract  walls  in 
the  resonance  system  will  shift,  es¬ 
pecially  the  first  formant,  upwards  in 
frequency.  This  upward  shift  is  more 


frequency  has  to  be  corrected  which  in 
its  turn  implies  a  non-linear  frequency 
transformation.  When  a  helium-oxygen 
mixture  is  used,  this  effect  of  the  non- 
rigid  vocal-tract  walls  is  much  less 
pronounced  and  is  in  fact  ohLvmnticefthlp 


pling  between  the  body  wall  and  the  lead 
and  the  existence  of  other  wall  shunts, 
like  the  soft  palate  and  the  sinus 
piriformis  down  to  the  trachea. 
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7.  HELIUM  SPEECH  UNSCRAMBLER  SIMULATIONS  AT  THE  SPEECH 

TRANSMISSION  LABORATORY 

by  Thomas  Murray 


Introduction.  In  1969,  we  started  to 
simulate  Helium-Speech  Unscramblers 
on  our  CD  1700  computer.  The  com¬ 
puter  can  store  the  speech  signal  to  be 
processed  on  a  CD  853  disc  storage  at  a 
maximum  sampling  rate  of  18  kHz, 
which  gives  a  bandwidth  of  9  kHz.  Of 
course,  we  can  store  the  data  at  double 
real  time  to  decrease  the  needed  band¬ 
width. 


Figures  3  through  6  show  actual  for¬ 
mant  frequencies  F^  in  different  parts 
of  a  system  as  a  function  of  normal  for¬ 
mant  frequencies  F^  in  air  at  sea  level. 
To  begin  with  all  of  them  show  the  theo¬ 
retical  distortion  curve  for  speech  in 
94%  He  and  6%  0g  at  a  depth  of  660'.  At 
this  depth  the  pressure  is  21  ata,  and  the 
lowest  formant  frequency  for  a  certain 
diver  is  Fw  =  .73  kHz  (Figure  3).  The 


Fig.  5.  (Murray)  Restoration  of  divers’ speech  with 
methods  in  the  frequency  domain. 

helium  gas  mixture  causes  a  linear  shift 
of  the  formant  frequencies  by  a  factor 
2.3.  The  expression  for  the  combined 
effect  is  shown  in  Figure  3.  Both 
vocoder  and  time  domain  methods  were 
simulated. 

Vocoder  Methods  (VM) .  Two  vocoder 
methods  were  simulated.  The  center 
frequencies  of  the  synthesis  filters  were 
equally  spaced  along  a  technical  mel 


Fig.  6.  (Murray)  Restoration  of  divers’ speech  with 
a  combined  heterodyne  method  and  a  time 
domain  method. 

scale.  Both  11  and  16  synthesis  filters 
were  used  (Figure  4) . 

In  the  Voice  Excited  Vocoder  the  rec¬ 
tified  and  smoothed  signal  from  each 
analysis  filter  was  modulated  by  a  fil¬ 
tered  and  clipped  speech  signal.  We 
tried  to  avoid  the  original  formant  fre¬ 
quencies  from  leaking  through  the  mod¬ 
ulator  by:  (1)  the  use  of  a  very  heavily 
clipped  signal  for  modulation;  and  (2), 
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modulating  the  different  channels  with 
signals ,  which  are  differently  filtered 
before  the  final  clipping. 

In  spite  of  these  attempts,  the  orig¬ 
inal  formants  tend  to  leak  through  this 
type  of  unscrambler. 

In  the  Pitch  Voice  Excited  Vocoder, 
all  channels  are  modulated  by  the  same 
signal,  which  comes  from  a  one-shot. 

In  order  to  optimize  the  pitch-synchron¬ 
ism,  a  few  methods  of  triggering  the 
one-shot  have  been  tested. 

Time  Domain  Methods.  In  the  Time 
Domain  Method,  TDM,  micro  segments 
of  the  divers'  speech  are  stretched  by 
the  transposition  factor  k.  To  do  this 
you  have  to  throw  away  some  parts  of 
the  speech  signal.  In  spite  of  this,  we 
have  a  feeling  that  there  is  quite  enough 
redundancy  of  the  speech  signal  to  carry 
the  essential  information. 

Figure  5  shows  the  effect  of  this 
method  in  the  frequency  domain.  As  you 
can  see,  the  nonlinear  pressure  effect  is 
only  partly  restored.  A  better  reduction 


of  this  error  is  achieved  if,  by  using  a 
heterodyne  method  (HM) ,  the  formant 
frequencies  are  moved  down  in  parallel 
before  the  TDM  is  applied.  In  Figure  6 
you  can  see  this  combined  method  shown 
in  3  curves:  Curve  A  shows  the  formants 
of  the  original  divers'  speech.  Curve  B 
shows  the  formants  after  the  heterodyne. 
Curve  C  shows  the  formants  after  both 
the  heterodyne  and  the  Time  Domain  de¬ 
vice  are  employed.  As  you  can  see ,  a 
better  reduction  of  the  pressure  effect 
is  achieved  with  this  method. 


Conclusion.  We  have  not  had  the 
opportunity  to  compare  these  four 
methods  on  a  very  large  amount  of 
materials.  Nevertheless,  from  the 
test  results  we  have  obtained  to  date, 
the  Time  Domain  Method  without  het¬ 
erodyne  seems  to  give  by  far  the  best 
intelligibility,  even  when  the  other 
methods  sound  more  natural.  This 
and  the  fact  that  the  TDM  also  is  very 
useful  for  some  other  purposes  like  time 
compression  and  expansion,  have  en¬ 
couraged  us  to  design  a  hardware  device 
based  upon  this  principle. 
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8.  DEVELOPMENTAL  EVALUATIONS  OF  AND  IMPROVEMENTS  TO 
A  HELIUM-SPEECH  UNSCRAMBLER* 

by  R.  A.  Flower 


Introduction.  The  basic  objective  of 
the  work  reported  here  was  to  achieve 
further  improvements  in  the  perform¬ 
ance  capabilities  of  the  Singer-Kearfott 
Helium  Speech  Unscrambler  technique. 
The  unscrambler  is  an  electronic  signal 
processor  which  corrects  on  a  real  time 
basis  the  voice  distortions  arising  from 
the  use  of  helium  in  breathing  gas  mix¬ 
tures  for  deep  diving  operations.  A 
working  model  of  the  unscrambler  exists 
in  prototype  form  as  a  result  of  previ¬ 
ously  completed  research  and  develop¬ 
ment  activities.^2’  14 

The  areas  under  investigation  include 
evaluation  of  a  group  of  potential  im¬ 
provements  in  instrumentation,  and 
evaluations  by  others  of  the  prototype 
equipment.  A  description  of  these  in¬ 
vestigations  and  their  current  status  is 
presented  in  the  paragraphs  that  follow . 

The  evaluations  of  intelligibility  re¬ 
ported  herein  were  preliminary  in  na¬ 
ture,  and  intended  only  as  a  guide  for 
deciding  whether  any  of  a  group  of  possi¬ 
ble  modifications  warranted  incorpora¬ 
tion  in  the  Singer-Kearfott  Unscrambler. 
These  evaluations  were  based  on  judg¬ 
ment  of  experienced  investigators  rather 
than  an  objective  listening  panel  meth¬ 
ods  .  Results  are  therefore  qualified  by 


*This  report  is  based  on  “Annual  Technical  Report  on 
Helium-Speech  Investigations”,  R.A.  Flower  from 
the  Singer  Company  to  the  Office  of  Naval  Research 
(Project  N0001 4-00-C-0387),  June  1971 . 


terms  such  as  "subjective"  and  "appar¬ 
ent",  and  remain  to  be  verified  by 
formal  test  means. 

AREAS  OF  INVESTIGATION 


Compensation  of  Non-Linear  Formant 
Shift.  Previous  studies! 3 have  shown 
that  in  helium  distorted  speech  the  lower 
formant  frequencies  are  shifted  by  a 
greater  ratio  at  a  given  depth  than  are 
the  higher  formant  frequencies .  For 
example ,  consider  the  predicted  effects 
of  depth  on  this  pair  of  hypothetical 
formants: 

Sea  Level  1000  Ft.  Ratio 

Fx  400  Hz  1360  Hz  3.4:1 

Fy  1600  Hz  4300  Hz  2.7:1 

The  ratio  2.7  is  the  asymptotic  value 
of  the  frequency  shift,  which  represents 
a  fair  approximation  for  all  formants 
having  a  sea-level  frequency  of  1  kHz  or 
more.  Many  unscramblers ,  including 
our  unit,  apply  a  single  frequency  cor¬ 
rection  ratio,  usually  one  close  to  the 
asymptotic  value  appropriate  to  the  op¬ 
erating  depth. 

This,  therefore,  produces  under¬ 
compensation  for  the  formants  lowest  in 
frequency,  suggesting  the  possibility 
that  an  adverse  effect  on  intelligibility 
may  result.  To  investigate  this  we  have 
devised  and  performed  an  experiment 
using  recorded  helium  speech  stimuli. 


29 


The  raw  speech  spectrum  was  divided 
into  two  parts  by  means  of  separation 
filters .  The  higher  frequency  part  was 
processed  by  one  unscrambler,  and  the 
lower  frequency  part  was  processed  by  a 
separate  identical  unscrambler.  A  dia¬ 
gram  of  the  apparatus  is  shown  in  Fig¬ 
ure  7. 

The  filters  were  of  a  commercial 
type  manufactured  by  the  Kron-Hite 
Corporation,  Model  3500.  These  have  a 
bandpass  characteristic,  the  upper  and 
lower  cutoff  frequencies  of  which  are  ad¬ 
justable  over  the  entire  frequency  range 
of  interest.  The  attenuation  beyond  cut¬ 
off  is  at  a  rate  of  24  dB  per  octave.  The 
two  unscramblers  have  an  adjustable  in¬ 
put/output  frequency  ratio  in  the  approx¬ 
imate  range  1.4:1  to  3:1. 

The  glottal  sync  circuit  of  the  un- 
scrambler^3  operating  on  the  low  fre¬ 
quency  band  was  used  to  control  the 
reset  cycle  of  both  unscramblers .  This 
prevented  the  possibility  of  having  the 
respective  reset  periods  unequal. 


Various  combinations  of  filter  cross¬ 
over  frequency  and  unscrambler  fre¬ 
quency  ratios  were  tried,  and  compared 
subjectively  with  the  performance  of  a 
single  unscrambler  covering  the  entire 
band.  Among  the  dual-unscrambler  com¬ 
binations  tried,  the  best  intelligibility 
appeared  to  result  with  a  crossover  fre¬ 
quency  of  2kHz  (lower  band  1-2  kHz, 
upper  band  2-10  kHz) ,  and  with  the  lower 
band  input/output  frequency  ratio  about 
1.25  times  that  of  the  higher  band. 

This  agrees  with  what  one  could  pre¬ 
dict  on  the  basis  of  the  non-linear  for¬ 
mant  shift  characteristic.  However,  the 
optimum  two -band  instrumentation  did 
not  result  in  significant  improvement  in 
apparent  intelligibility.  We  conclude 
that  a  two-step  correction  to  the  non¬ 
linear  characteristic  is  not  promising. 
The  potential  of  a  multi-step  or  continu¬ 
ous  correction  remains  an  open  question. 

Unscrambler  Artifacts.  Ideally  an 
unscrambler  should  produce  at  its  out¬ 
put  a  voice  waveform  essentially 


Fig.  7.  (Flower)  Diagram  of  the  test  apparatus  for  the  non-linear  formant  shift. 


30 


identical  to  that  of  the  same  talker  at  sea 
level.  Any  practical  device  will,  how¬ 
ever,  fail  to  reproduce  perfectly  all 
signal  data  appearing  at  its  input,  and  in 
addition  will  add  spurious  components  at 
its  output.  Such  artifacts  will,  of  course, 
vary  widely  in  severity  of  effects  on  in¬ 
telligibility.  Some  effects,  such  as 
noise  and  harmonic  distortion,  have  been 
analyzed  for  voice  communications  sys¬ 
tems  in  general,  but  others  unique  to  the 
device  in  question  are  most  readily  as¬ 
sessed  experimentally. 

We  therefore  have  examined  on  an 
empirical  basis  the  major  artifacts  of 
the  Singe r-Kearfott  unscrambler.  These 
include  synchronization  sensitivity  to 
signal  polarity ,  significant  levels  of  in¬ 
termodulation  components  in  the  output, 
and  voice  wave  envelope  modifications. 

Synchronization  Polarity.  The  glottal 
frequency  detector  13  or  pitch  synchron¬ 
izing  as  originally  designed  is  capable  of 
generating  a  trigger  on  either  an  initial 
negative  signal  swing  or  an  initial  posi¬ 
tive  signal  swing  in  each  pitch  period. 
This  feature  was  incorporated  so  as  to 
render  the  detector  insensitive  to  the 
polarity  (i.e. ,  zero  or  180®  phase)  of 
the  voice  signal,  which  is  normally  un¬ 
specified.  Proper  operation  also  is 
based  on  the  assumption  that  the  first 
half  cycle  of  the  lowest  formant  fre¬ 
quency  is  always  larger  than  all  succeed¬ 
ing  half  cycles  in  any  glottal  period. 

Experience  with  a  variety  of  sources 
of  helium  speech  signals  has  shown  that 
the  pitch  synchronizer  performance  is  in 
some  cases  relatively  sensitive  to  its 
amplitude  threshold  setting.  This  ap¬ 
pears  to  correlate  with  exceptions  to  the 
assumption  stated  above,  i.e. ,  the  first 


half  cycle  of  the  lowest  formant  is  not 
always  the  largest  in  amplitude,  where¬ 
upon  the  sync  detector  may  alternate  be¬ 
tween  the  first  and  second,  or  even 
third,  etc.,  half  cycles,  depending  upon 
total  signal  amplitude. 

When  this  condition  obtains ,  the  full 
synchronization  capability  can  be  re¬ 
stored  if  the  sync  detector  is  modified 
to  operate  on  a  unipolar  basis .  It  then 
synchronizes  only  to  an  initial  positive 
or  an  initial  negative  signal  swing,  de¬ 
pending  on  the  selected  polarity  of  the 
detector.  Improvement  in  sync  per¬ 
formance  capability  can  therefore  result 
with  use  of  unipolar  detection,  but  this 
requires  the  addition  of  a  polarity  selec¬ 
tion  control. 

Intermodulation  Components.  Exam¬ 
ination  of  the  unscrambler  output  wave 
has  shown  evidence  of  spurious  signals 
in  the  normal  voice  frequency  range. 
These  components  add  to  the  output 
noise  background  level,  and  thereby, 
tend  to  reduce  intelligibility.  The  prin¬ 
cipal  source  of  intermodulation  products 
in  the  circuit  under  test  was  found  to  be 
high  frequency  signals  appearing  at  the 
unscrambler  input  (voice  outputs,  cir¬ 
cuit  noise  and  acoustical  noise) ,  which 
heterodyne  with  the  unscrambler  sampl¬ 
ing  frequencies  .• 

The  problem  is  accentuated  by  use  of 
high  frequency  equalization  (higher  gain 
at  higher  frequencies)  at  the  unscram¬ 
bler  input.  The  solution  is  to  apply  a 
low-pass  filter  having  a  sharp  cutoff  at 
10  kHz.  The  filter  passes  all  necessary 
helium  voice  frequencies,  and  effective¬ 
ly  excludes  those  components  which  lead 
to  intermodulation  products. 
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A  four-pole  Chebyshev  filter  was  de¬ 
signed  for  this  purpose.  Subjective 
tests  of  the  unscrambler  with  the  filter 
added  demonstrate  significant  improve¬ 
ment. 

Voice  Wave  Envelope.  The  glottal 
wave  in  1. 0  atmosphere  of  air  has  an 
envelope  that  rises  rapidly  and  decays 
to  zero  relatively  slowly  during  each 
pitch  period.  The  unscrambler  output 
signal  envelope  differs  in  that  the  decay¬ 
ing  envelope  never  approaches  zero  am¬ 
plitude,  because  of  pitch  recycling^3.  A 
test  was  therefore  developed  to  deter¬ 
mine  whether  this  has  an  effect  on  intel¬ 
ligibility. 

The  test  circuit  is  shown  in  Figure  8. 
The  unscrambler  output  is  passed 
through  a  gain-control  circuit  having  a 
very  short  time  constant.  The  gain  is 


varied  at  the  pitch  rate  in  accordance 
with  the  output  wave  of  the  function  gen¬ 
erator.  With  proper  adjustment  the  un¬ 
scrambled  voice  waveform  can  thereby 
be  corrected  approximately  to  the  nor¬ 
mal  envelope  characteristics. 

Subjective  listening  tests  indicated 
that  the  intelligibility  was  about  the  same 
with  or  without  the  waveform  correction. 
We  conclude  that  this  artifact  has  no  sig¬ 
nificant  effect  on  intelligibility. 

Automatic  Gain  Control  (AGC) .  Op¬ 
timum  unscrambler  performance  re¬ 
quires  that  the  input  voice  signal  be 
maintained  at  a  moderately  fixed  aver¬ 
age  voltage  level.  While  variations  over 
perhaps  a  2:1  range  are  acceptable,  we 
find  that  voice  output  levels  of  any  indi¬ 
vidual,  and  the  difference  in  average 
levels  between  individuals,  can  be 
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Fig.  8.  (Flower)  Schema  of  correction  to  acoustic  envelope  of  helium-speech. 


expected  to  exceed  this  range;  As  an 
interim  expedient,  we  have  been  moni¬ 
toring  the  voice  levels  with  a  VU  meter 
and  adjusting  gain  manually  to  maintain 
the  proper  input  level. 

For  any  ultimate  solution,  however, 
an  automatic  means  is  needed  to  control 
the  input  level.  We  have  therefore  ex¬ 
amined  the  problem  of  providing  auto¬ 
matic  gain  control,  and  have  devised  the 
following  tentative  specification: 


Input  Level  Variation  30  db 

Output  Level  Variation  3  db 

Attack  Time  0.01  sec. 

Release  Time  1  sec 

Idle  Gain  30  db  below 

max.  gain 

It  is  expected  that  these  values  can 
be  realized  within  the  current  circuit 
design  state-of-art.  However,  we  do 


not  plan  at  the  present  time  to  pursue  the 
design  and  fabrication  of  the  circuit. 

Low  Quality  Signals.  We  intended  to 
investigate  unscrambler  performance 
with  stimuli  having  relatively  high  back¬ 
ground  noise  levels,  but  have  found  no 
recorded  signals  available  with  adequate 
parameter  documentation.  The  plan  is 
to  record  new  material  with  the  assist¬ 
ance  of  the  Navy  Experimental  Diving 
Unit.  A  firm  schedule  remains  to  be 
worked  out. 

SUMMARY 


The  basic  objective  of  this  work  was 
to  continue  evaluations  of  and  achieve 
further  improvements  in  the  Singer- 
Kearfott  unscrambler  technique.  Inves¬ 
tigations  show  that  improvements  result 
from  using  single  polarity  rather  than 
bipolar  pitch  sync  detection,  and  from 
excluding  input  signal  components  above 
10  kHz.  Simple  compensation  of  the 
non-linear  formant  shift,  and  modifica¬ 
tion  of  the  glottal  wave  decay  rate  pro¬ 
duced  no  apparent  improvement. 
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9.  THE  ADMIRALTY  RESEARCH  LABORATORY  PROCESSOR 
FOR  HELIUM  SPEECH 

by  J.  S.  Gill 


When  diving  to  great  depths  it  is  nec¬ 
essary  to  breathe  a  mixture  of  oxygen 
and  helium  in  which  the  partial  pressure 
of  oxygen  is  maintained  at  the  same 
magnitude  as  in  air  at  sea-level.  Speech 
produced  in  this  environment  is  badly 
distorted  and  at  depths  greater  than 
about  600  feet,  it  becomes  virtually  un¬ 
intelligible.  The  Admiralty  Research 
Laboratory  Helium  Speech  Processor 
was  developed  in  1969  to  meet  the  Royal 
Navy’s  needs  when  diving  throughout  the 
range  0  to  2000  feet. 

The  principal  cause  of  speech  distor¬ 
tion  in  oxy-helium  is  the  increased  vel¬ 
ocity  of  sound.  Speech  is  produced  by 
exciting  the  resonances  of  the  vocal 
tract  (see  figures  9  and  10)  by  puffs  of 
air  from  the  larynx  during  voiced  sounds 
and  by  turbulence  at  constrictions  during 
unvoiced  sounds.  The  intelligence  is 
conveyed  by  varying  the  vocal  reso¬ 
nances  and  excitation.  Typically  a  male 


vocal  tract  is  approximately  7  inches 
long  and  when  relaxed,  corresponding  to 
the  neutral  vowel,  and  filled  with  air, 
the  resonances  occur  at  500,  1500,  2500, 
etc.  Hertz.  The  frequencies  of  these 
resonances  depend  upon  the  velocity  of 
sound  within  the  vocal  tract.  When,  for 
example,  a  diver  is  speaking  at  1500 
feet  depth  in  helium-oxygen  these  res¬ 
onant  frequencies  are  almost  trebled  and 
speech  is  completely  unintelligible.  The 
periodicity  of  the  larynx  excitation  is 
virtually  unaffected  by  the  gas  mixture, 
apart  from  the  usual  increase  which  oc¬ 
curs  during  stress. 

Although  some  distortion  is  caused 
by  high  ambient  pressure,  for  example 
the  consonant-to-vowel  ratios  are  re¬ 
duced,  by  far  the  most  significant  dis¬ 
tortion  is  caused  by  the  effect  of  the 
gas  mixture  on  the  resonant  frequen¬ 
cies. 
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Fig.  9.  (Gill)  Diagram  of  a  simplified  model  fora  neutral  vowel  [£]. 
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Fig.  10.  (GUI)  Acoustic  spectra  showing  vocal  resonances  for  a  neutral  vowel 


Recordings 

Before  a  processor  could  be  devel¬ 
oped,  it  was  necessary  to  obtain  good 
recordings  of  speech  in  oxy-helium  at 
depth.  We  are  indebted  to  LT  J.  Bladh, 
USN  and  Petty  Officers  Cook  and  Frazer 
of  the  Royal  Navy  who  volunteered  to 
dive  to  800  feet  for  this  purpose.  This 


dive  was  carried  out  in  the  Deep  Trials 
Unit  at  Alverstoke  during  August,  1968. 
A  hydrophone  and  a  moving-coil  micro¬ 
phone  were  used  and  the  signals  were 
recorded  on  twin- track  magnetic  tape. 
Short-term  spectrum  analysis  of  these 
recordings  showed  clearly  that  the  fre¬ 
quency  response  of  the  microphone  var¬ 
ied  with  ambient  pressure,  whereas  the 
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hydrophone  performed  satisfactorily  at 
all  depths .  The  hydrophone  was  insensi¬ 
tive  ,  having  been  designed  to  operate  in 
water.  This  necessitated  close-speaking  , 
which  improved  the  signal/ambient  noise 
ratio .  The  hydrophone  efficiency  was 
greatest  when  operating  at  maximum 
depth,  where  the  pc  of  the  medium  was 
more  nearly  matched  to  the  hydrophone. 
Care  was  taken  to  minimize  the  noise 
from  the  carbon-dioxide  scrubbers. 

Preliminary  Studies 

There  was,  by  late  1968,  a  substan¬ 
tial  volume  of  literature  concerning 
helium  speech  processing  but  no  known 
equipment  met  our  requirements .  The 
available  literature  concerning  the  ap¬ 
plication  of  band-shifting,4, 27, 28  vo¬ 
coder22’  48 and  time-domain2,  39> 50,  51 
techniques  was  studied  and  various  pos- 
ibilities  were  investigated  by  computer 
simulation. 

The  band-shifting  technique  was  not 
adopted  because  it  was  not  possible, 
with  a  small  number  of  channels ,  to 
segregate  and  to  control  the  frequencies 
and  bandwidths  of  the  vocal  resonances 
and  the  resulting  speech  was  anharmonic. 

The  vocoder  method  was  simulated, 
in  both  voice-excited  and  pitch-extracted 
versions.  Acceptable  quality,  albeit 
'•vocoder  quality",  was  obtained  but  this 
solution  was  not  adopted  because  it 
would  have  involved  substantial  analogue 
circuitry  and  could  not  conveniently  pro¬ 
vide  continuously  variable  frequency- 
compression. 

Time  domain  processing  appeared  to 
offer  the  attractive  possibility  of  a  proc¬ 
essor  with  a  continuously  variable 


expansion  ratio  which  could  be  construc¬ 
ted  using  existing  integrated  circuits . 
There  is  a  long  history  concerning  pos¬ 
sible  applications  of  Doppler  Scanning 
methods  to  bandwidth  compression,  for 
example  French  and  Zinn1^,  the  German 
Tonschrieber  (1939  to  1945),  Gabor 18 
and  Fairbanks,  Everett  and  Jaeger2 
The  first  available  reference  to  the  ap¬ 
plication  of  these  techniques  to  helium 
speech  processing  appears  to  have  been 
Stover 5<?followed  by  Brubaker  and  Wurst3 
and  WestinghouseH 

The  first  system  to  be  simulated  was 
based  on  the  Gabor  ^technique  using  a 
pitch-synchronous  scanning  window  ap¬ 
proximately  50  milliseconds  wide.  This 
worked  well  but  required  a  large  store. 
The  window  width  was  therefore  reduced 
to  4  milliseconds  and  the  quality  still 
appeared  to  be  acceptable ,  even  when 
the  scanning  was  not  pitch- synchronous. 
This  tentative  conclusion  was  based  on 
the  simulation  of  one  short  sentence 
only.  Later,  when  the  breadboard  model 
had  been  built,  pitch  synchronism  proved 
to  be  essential. 

Principles  of  Operation 

The  basis  of  the  method  which  has 
been  developed  at  the  Admiralty  Re¬ 
search  Laboratory  20,  2*is  to  write  sec¬ 
tions  of  the  speech  in  a  temporary  store 
and  then  to  read  them  at  a  lower  rate 
(Fig.  11).  During  voiced  sounds  these 
sections  are  taken  from  the  most  intense 
part  of  each  larynx  period  and  the  re¬ 
mainder  is  rejected.  During  unvoiced 
sounds  the  sections  are  taken  less  regu¬ 
larly  and  are  more  closely  spaced.  The 
frequency  compression  resulting  from 
the  lower  replay  rate  is  inversely  pro¬ 
portional  to  the  time  expansion. 
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Fig.  1 1.  (Gill)  Waveform  diagrams  to  describe  the  principle  of  operation  of  the  Admiralty  Research 

Laboratory  Processor. 


The  section  length  must  be  less  than 
the  shortest  larynx  period,  otherwise 
the  larynx  periodicity  will  be  destroyed, 
but  it  must  be  sufficiently  long  to  ensure 
that  the  first  formant  of  the  helium 
speech  is  adequately  defined.  In  this 
application  a  length  of  2.5  milliseconds 
was  chosen  to  allow  operation  at  larynx 
frequencies  up  to  400  Hz.  The  expanded 
sections  overlap  whenever  the  product  of 
section  length  and  expansion  ratio  ex¬ 
ceeds  the  larynx  period. 


ary  stores  in  order  to  maintain  the  whole 
of  the  2.5  millisecond  sections. 

Longer  sections,  for  example  whole 
larynx  periods ,  could  be  accommodated 
by  increasing  the  lengths  of  the  stores 
and  incorporating  fast-shifting  between 
read-in  and  read-out,  but  this  has  not 
been  necessary.  The  technique  of  using 
the  most  intense  portion  of  each  larynx 
period  is  advantageous  when  operating  at 
a  low  signal/noise  ratio. 


Four  temporary  stores  are  used  and 
at  any  time  one  is  being  written  whilst 
the  remainder  are  being  read.  This  en¬ 
ables  the  expansion  ratio  to  be  varied 
over  the  range  1:1  to  3:1  whilst  retaining 
all  of  the  sampled  sections.  Operation 
at  larger  expansion  ratios  would  call  for 
an  increase  in  the  number  of  tempor¬ 


The  Processor 

The  system  is  outlined  in  Fig.  12. 
Signals  from  the  transducer  are  filtered 
by  a  12  kHz  low-pass  filter  and  then 
pass  through  an  adjustable  shaping  net¬ 
work  to  the  larynx  pulse  detector  and 
analogue-to-digital  converter.  This 
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i  steps  /sec. 

o 

5.  n  <  J  O: I  tor  this  model. 


SCHEMATIC  DIAGRAM  OF  A.  R,  L.  CONVERTOR. 


Fig.  12.  (Gill)  Schematic  diagram  of  the  Admiralty  Research  Laboratory  Convertor. 


adjustable  network  can  provide  up  to  20 
dB  lift  in  the  upper  part  of  the  spectrum 
to  enhance  the  unvoiced  sounds. 

The  larynx  pulse  detector  selects  the 
most  intense  2.5  millisecond-long  sec¬ 
tion  of  each  larynx  period.  These  sec¬ 
tions  are  encoded  at  30  kilosamples/sec 
into  8-bit  PCM  and  read  sequentially  in¬ 
to  shift-register  stores.  During  voiced 
sounds  each  one  of  the  four  banks  of  stor¬ 
age  contains  the  first  2 . 5  milliseconds  of 
a  larynx  period.  During  unvoiced  sounds 
the  monostable  fires  almost  continuous¬ 
ly  and  most  of  the  speech  is  stored. 

Read  shift  pulses  are  applied  to  the  3 
banks  of  storage  which  are  not  currently 
being  written  and  these  signals  are  in¬ 
terleaved  and  decoded  at  four  times  the 
read  shift  rate.  The  signals  from  the 
digital-to-analogue  convertor  are  passed 
through  a  4  kHz  low-pass  filter  which 
averages  the  samples  from  the  three 
stores  and  removes  the  unwanted  prod¬ 
ucts  of  the  sampling  process.  The  var¬ 


iable  frequency  oscillator  which  controls 
the  overall  expansion  ratio  is  adjustable 
over  the  range  120  Hz  to  40  kHz  to  pro¬ 
vide  any  ratio  which  may  be  required 
within  the  range  1:1  to  3:1. 

This  model  operated  successfully 
throughout  the  record-breaking  dive  to 
1500  feet,  made  by  John  Bevan  and  Peter 
Sharphouse  of  the  RNSS  at  RNPL,  Alver- 
stoke,  during  March  1970.  The  processed 
speech  was  at  all  times  highly  intelligible. 

Comment 

The  possibility  of  using  analogue 
shift  registers  was  contemplated  at  the 
breadboard  stage  of  the  project,  but 
reliable  digital  components  were  readily 
available  and  were  consequently  used. 

Improved  methods  of  analogue  stor¬ 
age  have  recently  been  developed  com¬ 
mercially  and  at  least  one  of  these ,  the 
"bucket-brigade  delay  line'^may  pro¬ 
vide  an  alternative  technique  for  a  proc¬ 
essor  based  on  the  above  principles. 
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10.  AN  INVESTIGATION  OF  He02  SPEECH  UNSCRAMBLERS  UNDER 

CONTROLLED  CONDITIONS 

by  H.B.  Rothman  and  H.  Hollien 


INTRODUCTION 

One  of  the  major  obstacles  to  man's 
exploration  of  the  oceans  is  the  inade¬ 
quacy  of  voice  communications  among 
divers  and  between  divers  and  surface 
support  personnel.  It  is  well  estab¬ 
lished  that  a  talker  in  an  environment  of 
high  helium  concentration  and  under 
high  ambient  pressure  experiences 
severe  distortions  in  the  intelligibility 
of  his  speech.  Although  the  fundamental 
frequency  of  the  talker's  voice  is  not 
affected,  there  is  an  upward  shift  of 
formant  frequencies  (formant  being  de¬ 
fined  as  an  area  of  acoustic  energy 
maxima  which  specify  vowels) .  This 
upward  shift,  which  is  nonlinear  at  low 
frequencies  and  is  linear  at  high  fre¬ 
quencies  is  one  of  the  causes  of  distor¬ 
tions  in  the  production  and  perception  of 
speech. 

Helium  speech  unscramblers ,  de¬ 
signed  to  improve  speech  intelligibility 
distorted  by  breathing  helium/oxygen 
mixtures,  have  been  developed.  Gener¬ 
ally,  these  systems  attempt  to  process 
variously  the  grossly  unintelligible 
speech  resulting  from  the  effects  of 
He02  breathing  mixtures  and  high  am¬ 
bient  pressure,  and  to  reconstruct  such 
signals  in  order  to  provide  adequate 
oral  communication. '  There  are  two 
general  methods  being  used  for  He02 
unscrambling.  They  are:  1)  frequency 
domainprocessing,  where  the  spectrum  of 
the  signal  is  manipulated  and  2)  time  domain 
processing,  with  the  time -varying signal 


being  manipulated.  Some  unscramblers 
use  combinations  of  both  techniques. 

Several  investigators2, 19,  31  > 43  have 
investigated  the  formant  frequency  shift 
of  vowels  in  He02  and  found  them  to  be 
close  to  predicted  levels.  Closer  agree¬ 
ment  between  the  shifts  found  and  pre¬ 
dicted  would  probably  have  been  reached 
were  it  not  for  the  fact  that  many  of  the 
tests  were  done  breathing  He02  at  at¬ 
mospheric  pressure.  Therefore, 
changes  due  to  pressure  did  not  occur. 
While  some  of  these  studies  2>  50  provide 
much  needed  base-line  information, 
they  are  of  limited  value  since  they  have 
examined  the  formants  of  vowels  pro¬ 
duced  in  isolation.  Other  investigations 
by  Copel4  at  the  Naval  Applied  Science 
Laboratory  (NASL)  and  by  Brubaker  and 
Wurst3  of  the  Singer  Company  have  only 
studied  F3  and  Fj,  respectively.  The 
Singer  group  also  investigated  conso¬ 
nant-vowel  amplitude  ratios  which  are 
elevated,  i.e. ,  the  consonant  peaks  are 
depressed  in  relation  to  the  vowel  am¬ 
plitudes.  However,  no  one  has  looked 
at  transitions  —  and  studies  of  the 
acoustic  characteristics  of  speech 
indicate  that  it  is  often  the  tran¬ 
sitions  between  consonants  and  vow¬ 
els  that  carry  the  relevant  informa¬ 
tion  for  perception.  For  example, 
researchers  at  Haskins  Laboratory5 
have  shown  that  the  direction  and  dur¬ 
ation  of  transitions,  particularly  for 
the  second  formant,  play  a  substantial 
role  in  the  determination  of  percep¬ 
tion. 
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It  is  obvious  from  the  above  that 
many  different  types  of  speech  materi¬ 
als,  recorded  under  a  wide  variety  of 
conditions,  have  been  used  to  obtain 
data  on  the  distortions  caused  by  He02 
and  on  the  corrective  ability  of  various 
unscramblers.  Because  of  the  disparity 
between  the  methodologies  employed,  it 
is  difficult  to  compare  the  data  in  a 
meaningful  manner.  To  correct  this 
situation,  the  Communication  Sciences 
Laboratory  has  undertaken  to  develop 
standardized  procedures ,  as  far  as  is 
possible,  for  testing  the  performance  of 
He(>2  speech  unscramblers.  To  develop 
an  experimental  procedure  of  this  na¬ 
ture,  three  sets  of  protocols  are  speci¬ 
fied.  First,  the  exact  nature  of  the 
equipment  to  be  evaluated  is  determined 
as  far  as  possible;  secondly,  speech  in¬ 
telligibility  tests  are  conducted,  using 
standardized  speech  material  equated 
for  difficulty,  and  finally  an  error  anal¬ 
ysis  is  carried  out. 

This  paper  will  present  results  on 
speech  intelligibility  based  on  the  per¬ 
formance  of  He(>2  unscramblers  through 
1970.  The  speech  material  used  was 
collected  primarily  at  the  Navy’s  Ex¬ 
perimental  Diving  Unit  (EDU),  Washing¬ 
ton,  D.C.  on  various  occasions  and 
under  different  conditions.  This  report 
will  first  present  data  collected  "on¬ 
line”  at  EDU.  Secondly,  we  will  de¬ 
scribe  the  development  of  our  "stand¬ 
ard”  test  for  He02  unscramblers. 
Thirdly,  we  will  present  data  from  our 
"off-line”  evaluations  (using  the  stand¬ 
ard  test). 

On-Line  Evaluation  of  He02  Unscram¬ 
blers 

The  first  set  of  recordings  were  ob¬ 
tained  during  Sealab-3  training  dives  at 


EDU .  The  facility  utilized  may  be  seen 
in  Figure  13;  it  consists  of  two  large 
cylinders.  The  one  on  the  right  is  a 
horizontal  two-lock  chamber  and  the 
other  is  a  vertical  two-level  chamber 
with  the  bottom  section  extending  well 
below  the  level  of  the  horizontal  cylin¬ 
der  .  Thus ,  three  of  the  four  units  are 
living,  dry  work  (igloo)  and  "wet"  work 
sections  (the  lower  level  of  the  vertical 
cylinder  is  usually  filled  with  water); 
the  outer  chamber  serves  as  an  emer¬ 
gency  "lock-in." 

In  order  to  make  recordings  of  the 
divers’  speech  without  excessive  rever¬ 
beration  from  the  steel  walls  of  the 
chamber,  it  was  necessary  to  provide  an 
area  surrounded  by  acoustically  absorp¬ 
tive  materials.  Since  space  considera¬ 
tions  preclude  the  introduction  of  an 
"acoustically  isolated  chamber"  into  the 
already  crowded  habitat,  environmental 
modifications  were  accomplished  by 
using  the  fiber-glass-filled  mattresses 
from  a  set  of  bunks  to  form  an  enclo¬ 
sure.  This  enclosure,  with  a  fiber¬ 
glass -filled  pillow  in  the  rear  and  the 
talker  acting  as  his  own  baffle  at  the 
front,  served  as  a  recording  chamber. 

To  be  properly  conducted,  assess¬ 
ment  of  system  performance  should 
follow  set  procedures;  hence  protocols 
were  specified.  They  are  as  follows: 

1)  more  than  one  (preferably  3-4)  talk¬ 
ers  should  be  used  for  each  system  and 
configuration  because  of  the  danger  that 
talker  variability  could  bias  the  results, 

2)  standardized  word  lists  (equated  for 
difficulty)  should  be  utilized  for  much 
the  same  reasons  *  3)  all  talkers  should 
read  at  least  two  word  lists  via  all  un¬ 
scrambler/microphone  combinations , 

4)  each  talker  should  read  via  a  specific 
microphone  through  all  unscramblers 
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Fig.  13.  (Rothman  &  Hollien)  Schematic  diagram  of  the  hyperbaric  facility  at  the  Navy'  Experimental 

Diving  Unit,  Washington,  D.  C. 


(and  unprocessed)  simultaneously  (see 
Figure  14)  at  least  10  listeners  (prefer¬ 
ably  15)  should  be  used  to  obtain  the  in¬ 
telligibility  levels. 

Evaluation  of  Responses 

Tapes  of  diver/talkers  responses 
were  spliced  to  allow  three  to  five  sec¬ 
ond  intervals  between  words.  These 
tapes  were  played  to  a  minimum  of  ten 
semi-trained  listeners;  i.e.,  University 
of  Florida  students  selected  on  the  basis 


of  1)  being  native  speakers  of  English, 
2)  having  normal  hearing  and  3)  being 
capable  of  performing  the  required 
listening  tasks.  Before  hearing  the 
tapes ,  listeners  are  required  to  score 
at  least  92%  on  a  screening  test  which 
included  50  words  from  CID  Auditory 
Word  List  A-3  (Hirsh  recording)  re¬ 
corded  in  +10  dB  of  thermal  noise ,  25 
words  recorded  in  a  He02  environment, 
25  words  from  diver  communication  sys 
tern  recordings  and  50  words  from  CID 
Auditory  Word  List  4-A.  The  final  50 
words  constitute  the  screening  test. 
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Fig.  14.  (Rothman  &  Hollien )  Schematic  diagram  showing  the  recording  set-up  used  at  the  Navy 

Experimental  Diving  Unit.  The  outputs  of  the  unscramblers  were  recorded  on  a  Honeywell  FM 
tape  recorder  and  unprocessed  speech  was  recorded  simultaneously  on  an  Ampex  601  tape 

recorder. 


Each,  listener  was  asked  to  write 
down  the  words  he  heard  from  the  tapes 
of  divers’  speech  recorded  from  the  out¬ 
put  of  the  several  unscramblers. 
Listener  responses  were  scored  for  the 
number  of  words  correct;  the  average 
percentage  of  words  correct  for  each 
unscrambler  was  then  used  as  its  over¬ 
all  intelligibility  score. 


Table  1  provides  the  mean  intelligi¬ 
bility  scores  for  processed  and  unproc¬ 
essed  speech  as  well  as  a  comparison  of 
the  Roanwell  and  Electrovoice  664  (EV- 
664)  microphones.  The  Roanwell  micro 
phone  was  designed  to  be  noise-cancel¬ 
ling,  to  operate  under  high  ambient 
pressures  and  specifically  to  be  the  input 
transducer  for  the  NASL  unscrambler. 
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Table  1.  Comparison  of  mean  intelligibility  scores  for  four  diver/talkers  in 
He02  at  600  feet  utilizing  the  Roanwell  and  Electrovoice  664 
microphones.  Recordings  of  unprocessed  speech  were  made 
simultaneously  with  those  processed  through  three  unscramblers 
situated  on-line.  Each  diver /score  is  the  mean  of  four  PB25 
lists.  N  =  at  least  10  listeners  in  each  case. 


Microphone 

Unscramblers 

Unprocessed 

H.R.B.  Singer 

NASL 

Westinghouse 

Roanwell 

20.0 

22.7 

52.2 

38.5 

EV  664 

7.5 

10.7 

27,6 

27.0 

1 


1 


As  shown  in  Table  1 ,  mean  intelligi¬ 
bility  scores  for  the  EV-664  range  from 
5%  for  unprocessed  speech  to  27 . 6%  for 
the  NASL  unscrambler.  Mean  intelligi¬ 
bility  scores  for  the  Roanwell  range 
from  20%  for  unprocessed  speech  to 
52.2%  for  the  NASL. 

From  the  data  presented  in  Table  1 , 
it  is  clear  that  under  the  operating  con¬ 
ditions  of  this  particular  study,  the 
Roanwell  microphone  proved  to  be  a 
superior  input  transducer  in  all  cases. 
It  was  expected  to  enhance  the  perform¬ 
ance  of  the  NASL  unscrambler  since  it 
was  designed  specifically  for  that  unit. 
However ,  it  substantially  increased  the 
intelligibility  when  used  with  the  other 
units.  A  mean  intelligibility  level  of 
52%  cannot  be  regarded  as  satisfactory 
although  it  begins  to  approach  a  level 
whereby  at  least  some  intelligible  voice 
communication  can  be  expected  between 
aquanauts  situated  in  a  chamber  and  the 
support  groups  on  the  outside. 


In  addition  to  the  evaluation  of  the 
above  unscramblers  as  recorded  in  a 
dry  chamber,  it  was  necessary  to  in¬ 
vestigate  their  effectiveness  using  vari¬ 
ous  input  configurations  such  as  would 
be  used  by  a  diver  in  the  water.  Obvi¬ 
ously  ,  the  addition  of  a  restricted  cavity 
to  the  vocal  tract  produces  profound 
acoustic  changes  in  the  resulting  utter¬ 
ance;  so  does  the  interface  of  the  div¬ 
er's  head  with  the  water.  This  sub¬ 
study  was  designed  to  add  to  the  stock¬ 
pile  of  information  already  gathered  on 
the  unscramblers  and  to  examine  the 
relative  effect  of  two  available  diver's 
masks:  the  Scott  and  the  MDL.  Table  2 
indicates  that  the  unprocessed  intelligi¬ 
bility  for  both  masks  is  near  the  levels 
obtained  in  the  chamber  with  the  EV-664 
microphone;  the  use  of  the  Scott  mask 
results  in  higher  levels  of  intelligibil¬ 
ity  with  and  without  the  aid  of  the  un¬ 
scramblers,  and,  as  stated  above,  the 
NASL  unscrambler  provides  the  great¬ 
est  improvement. 


{ 
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Table  2.  Mean  scores  for  the  Scott  and  MDL  Masks  at  600'  in  the  wet  pot  at 
EDU.  Each  diver  read  two  word  lists  on  two  different  days. 

Means  are  corrected  for  unequal  listener  N's. 


He02  Unscramblers 

Mask 

Unprocessed 

HRB  Singer 

Westinghouse 

NASL 

Scott 

10.1 

4.5 

20.1 

28.1 

MDL 

5.2 

3.7 

10.1 

10.5 

A  second  project  was  conducted  to 
evaluate  the  performance  of  three  He02 
speech  unscramblers  or  processors 
used  with  five  microphones.  The  un¬ 
scramblers  were  those  designed  and 
fabricated  by  (1)  Industrial  Research 
Products,  Inc. ,  (IRPI),  (2)  the  Raytheon 
Company  and  (3)  Singer-General  Preci¬ 
sion,  Inc. ,  (G-P/Singer)  (this  unit  is  a 
newer  and  different  model  than  the 
H.R.B.  Singer):  the  microphones  uti¬ 
lized  in  the  evaluations  included  the 
following:  (1)  Industrial  Research 
Products,  Inc.;  (2)  Singer-General 
Precision;  (3)  Electrovoice  664;  (4) 

U.  S.  Navy  Mark-8;  and  (5)  U.  S.  Navy 
Mark-n.  Figure  14  is  a  schematic  of 
the  recording  array. 

Criteria  3  and  4  of  the  protocols 
mentioned  previously  were  not  met  for 
this  study,  i.e.,  all  talkers  did  not  read 
at  least  two  word  lists  via  all  unscram¬ 
bler  microphone  combinations  and  each 
talker  did  not  read  via  a  specific  micro¬ 
phone  through  all  unscramblers  (and  un¬ 
processed)  simultaneously. 

Table  3  presents  individually,  97  ob¬ 
tained  scores;  each  is  based  on  at  least 


15  listeners.  Table  4  is  a  summary 
table  of  the  grouped  data  obtained  from 
Table  3.  It  will  be  noted  that  the  over¬ 
all  means  are  based  on  varying  numbers 
of  scores.  That  is,  30  lists  were  read 
for  the  IRPI  unit  and  for  unprocessed 
speech;  25  for  G-P/Singer,  10  for 
Raytheon  and  only  two  for  the  NASL 
unit. 


Rather  extensive  variability  among 
the  individual  scores  can  be  noted  on 
Table  3.  For  example,  scores  range 
from  4.8 %  to  41. 3%  for  unprocessed, 
from  5. 9%  to  65. 1%  via  the  IRPI  un¬ 
scrambler,  and  from  4. 0%  to  52 . 9%  via 
the  G-P/Singer.  Variability  is  not 
uncommon  to  studies  of  this  nature;  it 
can  be  the  result  of  such  factors  as  1) 
differences  in  recording  procedure,  2) 
noise  or  improper  grounding  of  equip¬ 
ment,  3)  talkers  and  4)  word  lists 
(such  effects  can  be  intermitent  or 
cumulative).  However,  in  this  case, 
the  variation  among  the  individual 
scores  in  this  evaluation  are  consid¬ 
ered  to  be  somewhat  excessive.  On 
the  other  hand,  this  talker  variance 
can  be  explained  to  some  degree; 
much  of  it  seems  due  to  the  four 
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Table  3.  Raw  data  from  the  evaluations  of  the  four  He(>2  speech  unscramblers. 

Scores  are  based  on  the  percent  correct  responses  from  15  listeners 
of  each  talker/mierophone/list/unscrambler  condition. 


Unproc- 


MEAN 


GP/Singer 


MEAN 


EV-664-1 


MEAN 


EV-664-2 


MEAN 


Harder  T-3 
N-8 

Morey  0-8 
N-8 
M-10 
P-4 

Moore  P10 
T-8 


Morey  0-8 
N-8 


Harder  T-3 
N-8 

Morey  P-4 


Moore 


Harder  T-3 
N-8 

Morey  P-4 


Moore 


Unscramblers 

IRPI 

G- P/Singer 

Raytheon 

NASL 

j 

48.8 

48.6 

_ 

39.1 

- 

39.7 

- 

40.3 

- 

- 

30.7 

30.4 

16.8 

- 

32.8 

- 

- 

54.4 

- 

- 

- 

61.6 

- 

54.6 

- 

58.6 

- 

47.5 

- 

57.8 

- 

43.4 

29.6 

35.1 

16.8 

18.3 

26.3 

53.4 

31.8 

P-4 

M-10 

T-8 

P-10 


P-4 

M-10 

T-8 

Q-4 

P-10 


49.6 

48.8 
65.1 

56.8 
44.3 
35.5 


50.0 


Navy  Mark  8  Morey  P-3 

Q-10 

Moore  T-4 
Q-10 

MEAN 


30.1 

15.0 

42.7 
33.9 
23.5 

18.7 


(continued  next  page) 


Table  3.  (Continued) 


Microphone 

Talker 

List 

Unproc¬ 

essed 

Unscramblers 

IRPI 

G-P/Singer 

Raytheon 

NASL 

Navy  Mark  11 

Harder 

0-3 

5.1 

22.3 

32.6 

— 

S-5 

5.1 

50.1 

52.9 

- 

- 

Q-10 

- 

31.3 

- 

- 

- 

Morey 

S-5 

16.8 

33.4 

22.7 

- 

- 

T-4 

27.4 

22.8 

37.2 

- 

- 

Moore 

0-3 

6.2 

15.7 

18.1 

- 

- 

T-4 

10.7 

13.2 

12.8 

- 

- 

MEAN 

11.9 

27.8 

27.7 

32.6 

- 

Category 

Means 

15.9 

32.5 

20.3 

39.6 

31.8 

Individual 

Means 

15.7 

32.7 

21.5 

45.1- 

31.8 

Table  4.  Summary  table  of  Unscrambler  evaluation.  Mean  scores  of  words 
correct  for  each  He02  unscrambler  and  microphone.  Diver/ 
talker  depth  was  650'  in  He0£.  N=15  listeners  for  each  PB25 
Campbell  word  list. 


Microphones 

Unprocessed 

Unscramblers 

IRPI 

G-P 

Singer 

Raytheon 

NASL 

IRPI 

19.1  (6) 

43.4  (6) 

16.8  (1) 

53.4  (6) 

31.8  (2) 

Singer 

20.2  (2) 

32.4  (2) 

22.3  (2) 

- 

- 

EV-664  (1) 

23.6  (6) 

50.0  (6) 

27.3  (6) 

- 

- 

EV-664  (2) 

9.5  (6) 

12.6  (6) 

17.5  (6) 

32.7  (3) 

-  ' 

Mark-8 

11.2  (4) 

E5E1 

10.2  (4) 

- 

- 

Mark-11 

11.9  (6) 

EH'I 

Ksm 

32.6  (1) 

- 

Category  Means 

15.9 

32.5 

20.3 

39.6 

31.8 

Mean-all  scores 

15.7 

32.7 

21.5 

45.1 

31.8 

Number  talkers 

30 

30 

25 

10 

2 
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conditions  listed  above,  the  balance  to 
the  sharp  differences  between  the  data 
collected  by  the  authors  (see  EV-664-1) 
and  those  provided  by  others  (see  re¬ 
maining  data, but  especially  EV-664-2). 

The  main  effects  are  apparent.  First , 
none  of  the  units  improved  speech  in¬ 
telligibility  to  levels  that  could  be 
considered  adequate  for  good  communi¬ 
cation.  Hence,  it  cannot  be  said  that 
any  of  them  exhibit  solutions  to  the 
He02  communication  problem.  On  the 
other  hand,  at  least  two  of  the  units 
(Raytheon  and  IRPI)  improved  speech 
intelligibility  by  substantial  amounts. 
Indeed,  the  Raytheon  was  the  superior 
performer  and  provided  nearly  200% 
improvement;  the  IRPI  was  second  with 
about  100%  improvement.  With  respect 
to  the  G-P/Singer  unit,  it  can  be  said 
that  it  performed  considerably  better 
than  did  the  previous  Singer  attempts 
but  still  not  on  the  level  of  the  other  two 
units.  Statistical  evaluation  of  these 
data  is  difficult  since  the  scores  for  the 
three  units  tested  (omitting  NASL)  are 
not  based  on  random  trials ;  nor  can  the 
assumption  of  homogeneity  of  variance 
be  met.  Nevertheless,  the  differences 
are  significant  when  a  relatively  in¬ 
sensitive  test  is  used. 

It  is  necessary,  at  this  point,  to 
comment  on  the  lack  of  similarity  be¬ 
tween  the  levels  demonstrated  by  EV- 
664  (1)  and  those  by  EV-664  (2).  Since 
the  talkers  and  lists  were  essentially 
the  same,  and  the  same  microphone  was 
utilized,  it  would  be  expected  that  the 
scores  would  be  similar  —  or  due  to  adap¬ 
tion,  that  the  second  set  would  provide 
higher  means  (not  lower)  than  the  first. 
In  carefully  reviewing  both  sets  of  tape 
recordings,  striking  differences  were 


apparent.  Specifically,  the  recordings 
made  by  the  authors  (EV-664  (1))  are  of 
substantially  higher  quality  than  those 
made  later.  The  tapes  in  the  second 
trial  exhibited  both  greater  variation  in 
amplitude  and  more  system  and  habitat 
noise.  Hence,  it  is  strongly  recom¬ 
mended  that  whenever  such  evaluations 
are  made  in  the  future,  the  group  that 
has  the  responsibility  for  the  project 
should  be  allowed  full  control  over  all 
aspects  of  the  investigation. 

Comment  also  is  necessary  about 
microphone  performance.  The  IRPI 
and  G-P/Singer  microphones  performed 
at  about  the  same  level  (even  though  the 
Singer  scores  are  based  on  only  two 
talker/lists,  a  tentative  comparison  is 
considered  possible).  Secondly,  the 
EV-664,  the  Mark-8  and  the  Mark-11 
also  performed  comparably.  Finally, it 
can  be  noted  that  the  two  microphones 
in  the  first  category  (IRPI  and  G-P/ 
Singer)  provided  mean  scores  that  were 
about  double  those  of  the  other  three. 

It  is  suggested,  therefore,  that  the 
G-P/Singer  and  IRPI  microphones  pos¬ 
sibly  will  permit  superior  operation  of 
HeC>2  speech  unscramblers  and  should 
be  considered  for  use  in  conjunction 
with  such  systems  in  future  applica¬ 
tions  . 

Additional  information  was  obtained 
by  comparing  the  three  systems  tested. 
Here,  paired  scores  (same  talker, 
word  list  and  microphone  simultane¬ 
ously  via  the  paired  unscramblers) 
were  used  in  order  to  obtain  further 
direct  comparison  of  the  units.  There 
were  23  lists  allowing  for  comparison 
of  the  IRPI  and  G-P/Singer;  four  be¬ 
tween  the  Raytheon  and  G-P/Singer  and 
six  for  the  Raytheon  and  IRPI.  In  these 
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comparisons  the  Raytheon  showed  su¬ 
perior  performance  (100%  better  than 
the  G-P/Singer  and  29%  better  than  the 
IRPI),  In  turn,  the  IRPI  unit  per¬ 
formed  39%  better  than  the  G-P/Singer 
device.  In  any  case,  the  rankings 
among  the  three  units  were  of  the  same 
order  and  of  comparable  magnitudes  as 
they  were  on  the  other  tables  (main  ef¬ 
fects). 

Development  of  an  Off-Line  Test 

In  the  course  of  gathering  data  for 
the  evaluation  of  helium  speech  un¬ 
scramblers,  the  Communication  Sci¬ 
ences  Laboratory  has  collected  a  large 
inventory  of  speech  materials  repre¬ 
senting  a  wide  range  of  pressure  and 
gas  mixtures.  Because  of  this  exten¬ 
sive  stockpile  of  speech  material  re¬ 
corded  in  hyperbaric  helium  environ¬ 
ments,  it  became  possible  to  devise  an 
off-line  test  for  evaluating  He(>2  un¬ 
scramblers.  The  advantages  of  an  off¬ 
line  test  are  obvious;  further,  such  a 
test  also  is  particularly  useftol  for  the 
preliminary  testing  of  an  unscrambler 
without  the  necessity  of  an  actual 
chamber  dive.  The  basic  criteria  used 
for  selecting  material  for  the  off-line 
test  were  that  it  be  1)  rigorous  and  2) 
representative  of  the  varied  conditions 
found  in  on-line  situations. 

The  recordings  comprising  the  test 
are  "good"  recordings,  i.e.,  they  were 
closely  monitored  to  prevent  any  unre¬ 
lated  distortions  from  occurring.  How¬ 
ever,  the  actual  problems  faced  by  the 
unscramblers  have  been  carefully  in¬ 
cluded.  For  example,  noise  is  always 
a  factor,  so  noisy  tapes  are  included, 
farther,  the  diver  will  be  at  various 
depths  in  a  hyperbaric  chamber  or  the 


open  sea,  he  will  be  breathing  various 
mixtures  of  He02  through  different 
mask  and  helmet  configurations  while 
using  different  microphones  under 
varying  conditions  of  ambient  noise. 
Hence,  such  conditions  must  be  used 
and  in  this  regard,  the  off-line  test  is 
intentionally  rigorous  and  favors  no 
particular  unscrambler.  It  must  be 
remembered  that  the  results  of  an  un¬ 
scrambler’s  performance  become  es¬ 
pecially  meaningful  when  its  perform¬ 
ance  is  compared  to  the  other  un¬ 
scramblers. 

To  be  specific,  the  various  word 
lists  chosen  for  the  test  represent  as 
many  varied  conditions  as  possible  and 
allow  for  a  rigorous  evaluation  of  the  . 
performance  of  He02  unscramblers. 
The  criteria  for  the  word  list  selection 
follows:  (1)  High,  medium  and  low  in¬ 
telligibility  -  Intelligibility  scores  for 
unprocessed  word  lists  for  depths  of 
200,  450,  and  600  feet  that  had  been 
obtained  from  previous  studies  were 
divided  into  these  three  categories. 

One  list  each  was  chosen  to  represent 
high  and  medium  intelligibility  and  two 
lists  for  low  intelligibility  were  se¬ 
lected  for  each  depth.  All  of  these 
lists  were  first  recordings  made  upon 
reaching  depth:  (2)  Noise  -  Two  re¬ 
cordings  each,  of  word  lists  judged  to 
be  noisy,  were  selected  for  each  of  the 
three  depths.  The  rationale  for  includ¬ 
ing  this  material  is  that  a  noisy  habitat 
is  a  "typical"  situation;  (3)  Last  re¬ 
cordings  before  starting  ascent  (LBA)  - 
These  lists  reflect  diver  intelligibility 
after  he  has  a  chance  to  modify  his 
speech  and  has  attempted  to  become 
more  intelligible ;  (4)  Roanwell  micro¬ 
phone  at  600-825  feet;  (5)  Wet  dive  - 
600  feet  -  As  all  the  above  conditions 
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occurred  in  a  dry  chamber,  it  was 
judged  that  this  condition  would  add 
considerably  to  the  evaluation  of  the 
performance  of  He02  unscramblers 
since  unseramblers  will  have  to  proc¬ 
ess  divers’  speech  while  they  are 
working  in  the  sea.  In  this  case,  diver/ 
talkers  wore  either  the  Scott  or  MDL 
mask  with  the  MDL  microphone  —  and 
had  been  at  depth  for  some  time  when 
these  recordings  were  made.  The  EV- 
664  microphone  was  used  for  conditions 
(1),  (2),  and  (3).  hi  all,  57  separate 
word  lists  are  used  in  the  CSL  off-line 
unscrambler  test. 

Procedure 

In  order  to  conduct  bench  tests  of 
unscramblers,  recorded  phrases  were 
played  on  an  Ampex  tape  recorder 
whose  line  output  fed  into  the  input  of 
an  unscrambler.  The  unscrambler 
output  was  fed  into  a  Marantz  amplifier 
coupled  to  a  Marantz  speaker.  By  such 
monitoring,  we  were  able  to  keep  the 
amplifier  and  speaker  frequency  re¬ 
sponse  constant  across  all  unscram¬ 
blers.  An  attempt  was  made  to  "tune" 
the  unseramblers  according  to  the 
manufacture’s  specifications.  After 
reaching  this  point,  three  listeners 
performed  a  modified  method  of  adjust¬ 
ment  to  determine  the  adjustment  which 
produced  the  greatest  intelligibility. 
This  was  done  by  repeatedly  playing  a 
given  signal  while  bracketing  the  area 
which  gave  best  intelligibility.  When 
agreement  among  the  three  listeners 
was  reached,  the  unseramblers  output 
was  recorded  on  a  second  Ampex.  This 
bracketing  technique  was  carried  out 
for  each  unscrambler,  talker  and  con¬ 
dition.  Input  and  output  levels  were 


carefully  monitored  to  prevent  distor¬ 
tion  of  the  signal. 

Results 

Four  unseramblers  were  tested  for 
this  particular  run.  They  are  Inte¬ 
grated  Electronics  Corp.  (old  NASL), 
the  IRPI,  Raytheon  and  Singer/G-P. 

As  may  be  seen  in  the  Table  5,  the 
previous  study  showed  the  Raytheon 
unscrambler  to  provide  the  most  im¬ 
provement;  the  present  study  provides 
data  which  reverses  this  finding.  That 
is,  from  an  unprocessed  score  of  14. 9%, 
the  improvement  provided  by  the  Inte¬ 
grated  Electronics  Corporation  unit  is 
24.6%;  for  the  IRPI  it  is  31.7%;  20.0% 
for  the  Raytheon  and  30.8%  for  the 
Singer/G-P. 

Table  6  presents  a  breakdown  of  the 
data  by  depth  and  condition.  A  look  at 
the  scores  for  the  Raytheon  unit  show 
that  its  performance  deteriorated  at 
greater  depths  especially  when  used 
with  the  Roanwell  microphone  and  dur¬ 
ing  the  wet  dive.  A  reexamination  of 
Table  3  will  show  that  the  data  from  the 
unscrambler  evaluation  done  at  the 
Westinghouse  facility,  the  high  scores 
for  the  Raytheon  unscrambler  were 
primarily  the  result  of  its  use  with  the 
IRPI  microphone. 

At  present,  we  are  continuing  our 
off-line  evaluation  of  unseramblers. 

We  should  be  receiving  tapes  of  our 
test  through  prototype  units  developed 
by  Necton  Bylinnium  in  Colorado,  and 
by  Gunnar  Fant  at  the  Royal  Institute 
of  Technology,  Stockholm,  Sweden. 
Other  units  we  are  hoping  to  receive 
data  on  include  the  Helle  Engineering 
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Table  5.  Intelligibility  level  for  the  four  unseramblers  tested  during  the 
summer  of  1970.  Means  are  not  corrected  for  unequal 
listeners  N's  but  have  been  equated  for  unequal  number 
of  lists  read. 

Unscramblers 

IEC  IRPI  Raytheon 

24.6  31.7  20.0 

755  781  785 


UNPROCESSED 


UNSCRAMBLERS 


Singer 


Unprocessed 


Mean 

Number  of 
listeners 


Fig.  15.  (Rothman  &  Hollicn)  Schematic  diagram  of  the  recording  set-up  used  at  Westinghouse :  This  array 
enabled  all  microphones  to  be  used  with  all  unscramblers  with  simultaneous  recordings  made  of 

unprocessed  speech. 


Table  6.  Intelligibility  levels  (in  percent)  obtained  for  four  He02  unseramblers  under  the 
following  varied  conditions.  Scores  were  not  corrected  for  unequal  listener  H’s, 
but  have  been  equated  for  unequal  number  of  lists  used. 
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*  Only  five  lists  were  used. 
**  13  lists  were  used. 


Company's  Hellephone,  and  the  Stand¬ 
ard  Telecommunications  Laboratory 
Ltd.  unit  from  Britian. 

Although  all  the  data  on  all  unscram¬ 
blers  is  not  in,  the  following  generali¬ 
zations  can  be  made  with  confidence 
about  unscramblers  up  to  1970.  They 
are:  1)  None  of  the  unscramblers  pro¬ 
vide  substantial  enough  speech  improve¬ 
ment  to  allow  for  adequate  diver -to - 
diver  or  diver-to-surface  communica¬ 
tion;  2)  of  the  units  tested  the  HtPI 
performed  best  overall;  3)  the  IRPI  and 
Singer/G-P  microphones  appear  to  be 
similar  in  performance  and  superior  to 
the  EV-664,  the  Mark-8  and  the  Mark- 
11  microphones;  and  4)  whenever  such 
evaluations  are  conducted  in  the  future , 
rigorous  and  exacting  procedures  must 
be  followed  if  valid  performance  levels 
are  to  be  determined. 


What  then  is  the  current  State-of- 
the-Art?  Obviously  improvements  will 
be  made  —  and  are  being  made  —  in 
He02  unscramblers.  Some  of  these 
improvements  will  occur  because 


design  engineers  are  becoming  more 
aware  of  the  complexities  of  speaking  in 
the  pressurized  helium  environment 
and  are  beginning  to  take  basic  research 
into  consideration.  We  hope  that  our 
efforts  in  that  regard  and  with  respect 
to  our  evaluations  of  unscramblers  will 
assist  them. 


Finally,  a  new  aspect  of  our  overall 
program  is  to  study  the  human  as  a 
helium-speech  decoder.  Many  divers  re¬ 
port  an  increase  in  their  ability  to  under¬ 
stand  helium  speech  in  a  high  ambient 
pressure  environment  after  a  period  of 
time.  Some  of  the  data  processors  at 
CSL  who  have  spent  many  hours  listen¬ 
ing  to  tapes  of  helium  speech  also  re¬ 
port  an  increasing  ability  at  decoding 
this  type  of  speech  even  though  it  is 
greatly  distorted.  Because  of  these  in¬ 
dications  and  because  the  human  de¬ 
coder  should  be  a  very  efficient  auditor, 
a  promising  area  of  research  has 
opened.  Hence,  we  now  are  developing 
techniques  for  identifying  and  training 
listeners  to  be  efficient  decoders  of 
helium  speech. 
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OPEN  FORUM  AND  DISCUSSION  PERIOD,  CHAIRMAN'S  SUMMARY 

by  G.  C 


The  topics  discussed  during  this 
portion  of  the  workshop  could  be  classi¬ 
fied  under  six  general  headings.  The 
subject  matter  tended  to  be  additions 
and  addenda  to  underwater  communica¬ 
tion  problems  generally,  rather  than 
those  strictly  associated  with  helium- 
speech  translators .  The  six  topics 
were:  (1)  a  more  detailed  explanation 
of  the  project  concerned  with  an  uni¬ 
versal  underwater  communication  sys¬ 
tem  development  and  evaluation;  (2)  the 
human  talker -listener  factors  of  a  total 
communication  system;  (c)  considera¬ 
tion  in  selection  (if  necessary)  of  diver- 
talkers;  (4)  possible  methods  of  pre¬ 
processing  the  helium  speech  prior  to 
translator  processing  which  may  en¬ 
hance  translator  efficiency;  (5)  speaker- 
listener  communication  problems 
among  pressure  chamber  subjects;  and 
(6)  factors  inherent  in  quick  solutions 
to  operational  problems. 

NSRDL,  Panama  City  Underwater 
Communication  Project 

At  the  present  time  navy  divers  can¬ 
not  communicate  reliably  by  voice  at 
depths  of  10  feet,  or  at  any  depth.  A 
NavShips,  SupSal,  ONR  project  was 
initiated,  using  various  components  of 
underwater  communication  systems 
which  had  been  developed  and  were 
available  as  "off-the-shelf"  items  in 
1970,  to  combine  these  components  in¬ 
to  a  useable  system.  Each  sub-system 
component  would  be  made  electronical¬ 
ly  compatable  so  that  any  combination 
of  three  microphones,  three  face  masks. 
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two  helmets,  several  acoustic  and 
wired  transmission  communication 
units,  and  five  helium  unscramblers 
could  be  tested  and  evaluated,  both  in 
bench  tests,  saltwater  tests,  and  psy¬ 
chophysical  tests  under  laboratory  and 
open-sea  conditions.  The  goal  is  to 
achieve  a  design  system  readily  adapt¬ 
able  to  any  and  all  types  of  diving  situ¬ 
ations  from  zero  to  1000  feet  depth  and 
with  a  range  (if  acoustic  transmission 
modes  were  employed)  of  4000  yards. 
The  resultant  one  or  more  "best"  com¬ 
bination  of  components  would  become 
an  interim  model  for  Navy  specified 
design  and  purchase.  Any  inadequate 
subunits  of  the  total  system  should  be 
revealed  by  the  project  testing  proce¬ 
dures  and  yield  indications  for  im¬ 
proved  design  and/or  function.  The 
detailed  bench  tests  are  almost  com¬ 
plete,  the  sea  test  plans  are  written 
and  should  begin  shortly.  The  comple¬ 
tion  date  of  the  project  is  August  1972. 
Additional  information  can  be  found  in 
NSRDL,  Panama  City  reports,  James 
Elkins,  Project  Manager  or  from  Mr. 
Frank  Romano,  Naval  Ship  Systems 
Command,  Arlington,  Virginia. 

Talker-Listener  Factors  of  the  Total 
Communication  Chain 

As  pointed  out  by  several  of  the 
workshop  participants  any  speech  com¬ 
munication  network  inseparably  in¬ 
cludes  the  speaker  (encoder)  and  the 
listener  (decoder)  as  well  as  any  trans¬ 
mission  link  carrying  the  message. 
Questions  were  posed  which  inquired, 


53 


generally,  should  there  be  attempts  to 
train  speakers  and  listeners  and,  if  so, 
how  and  how  much? 

Since,  in  reality,  there  is  no  under¬ 
water  speech  transmission  equipment 
that  is  operationally  used  routinely,  the 
present  training  of  divers  consists  of 
learning  the  code  of  tugs  on  the  life  line 
and  the  international  system  of  hand 
signals  used  by  SCUBA  divers.  In  an¬ 
ticipation  of  functional  voice  communi¬ 
cation  systems,  two  research  efforts 
were  reported,  one  series  from  the 
University  of  Florida  and  another  just 
starting  with  Westinghouse,  aided  by 
the  University  of  Iowa.  The  data  from 
the  first  series  indicated  that  divers 
who  were  provided  voice  communica¬ 
tion  equipment  took  longer  to  perform  a 
group  of  underwater  tasks  than  divers 
who  could  not  talk  to  each  other.  The 
unanticipated  findings  are  confounded  by 
many  uncontrollable  variables  and  are 
somewhat  suspect,  not  the  least  of 
which  was  that  the  divers  using  com¬ 
munications  systems  were  not  trained 
in  its  use  and  were  probably  distracted 
by  the  noise  and  noise  masked  speech 
as  well  as  the  highly  distorted  speech. 
The  Westinghouse  study  will  attempt  to 
train  heliox  speech  listeners  as  a  for¬ 
eign  language  would  be  taught. 

The  question  was  posed  by  CDR 
Joseph  Bloom;  would  talker -listener 
training  be  necessary  since  the  pre- 
ceeding  two  days  of  the  conference 
seemed  to  show  the  transmission  and 
translation  of  the  speech  signals  were 
developed  to  the  point  that  the  "black 
box"  between  the  talker  and  listener  of¬ 
fered  no  further  problems.  In  response, 
opinions  were  expressed  which  ranged 


from  "training  would  be  minimal"  to 
"even  under  completely  distortion- free 
conditions  personnel  need  experience 
with  a  system  in  order  to  derive  maxi¬ 
mum  utilization. "  Several  concrete 
suggestions  and  rationalizations  were 
given  which  would  improve  the  talker  or 
listener  portion  of  the  "communications 
chain. " 

However,  several  times  during  the 
general  discussions  and  in  subsequent 
"after  session"  talk  it  was  iterated  and 
reiterated  that  the  apparent  state  of 
advancement  in  translator  and  micro¬ 
phone  development  was  built  upon  evi¬ 
dence  acquired  under  idealized  or  con¬ 
trolled  laboratory  conditions .  The 
closest  to  realistic  situations  were  in 
chambers  simulating  depth,  rarely  even 
in  a  wet  pot,  with  little  if  any  ambient 
noises  of  masks,  valves,  bubbles,  etc. 
confounding  the  data.  Research  and 
development  must  take  into  considera¬ 
tion  those  variables.  Additionally, 
only  one  method  of  helium-speech 
translation  has  been  exploited  and  that 
is  time  sampling  (of  course  heterodyne 
system  was  tried  but  most  designers 
agree  it  is  not  a  workable  method,  es¬ 
pecially  with  the  circuitry  available  one 
or  two  years  ago).  Other  methods  such 
as  vocoder,  digital  coding-decoding  or 
some  hybrid  of  these  methods  with  time 
sampling  or  heterodyning  may  yet  prove 
to  be  more  resistant  to  the  operational¬ 
ly  encountered  speech  distortion  and 
masking  sources.  A  viable  program  of 
research  is  still  needed  until  efficient 
voice  communications  can  enhance  ef¬ 
ficiently  performed  underwater  work. 

Diver-Talker  Selection  and  Training 

As  mentioned  above,  the  novice 
diver  talker-listener  will  benefit  from 
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experiencing  voice  communications  and 
the  system  operations  in  the  environ¬ 
ment  in  which  he  uses  them.  An  analogy 
was  given  from  data  obtained  from  air¬ 
craft  communications  in  which  student 
pilots  were  given  four  hours  of  training 
in  aircraft  noise  using  the  equipment 
they  would  encounter.  On  the  average 
their  intelligibility  scores  during  train¬ 
ing  improved  15-20  per  cent.  Follow¬ 
up  testing  after  some  four  years  re¬ 
vealed  they  retained  their  training 
without  loss . 

In  spite  of  the  recent  advancements 
made  in  diver  communication  system 
components,  much  research  is  still 
needed  to  determine  what  factors  con¬ 
tribute  to  system  integrity  or  failure 
when  the  components  are  combined  into 
various  configurations.  There  are  but 
limited  data  concerning  the  effects  of 
hyperbaric  pressures  upon  the  diver  as 
a  talker  and  none  as  a  listener.  Sever¬ 
al  of  the  participants  raised  the  question 
of  side  tone  being  introduced  into  a  sys¬ 
tem,  or  not.  The  effects  of  spectrally 
distorted  speech,  time  delay,  intensity 
changes,  etc.  modifying  the  speakers' 
sidetone  in  air  have  been  rather  exten¬ 
sively  explored  and  show  the  effects  can 
either  hinder  or  enhance  talker  intelli¬ 
gibility.  The  optimum  sidetone  manip¬ 
ulations  for  divers  in  a  helium-oxygen 
breathing  gas,  under  varying  pres¬ 
sures,  in  various  masking  noise  en¬ 
vironments  both  in  and  out  of  the  water 
need  to  be  systemmatically  studied. 

It  is  possible,  also,  that  no  amount 
of  training  or  equipment  refinements 
will  yield  one  hundred  per  cent  trans¬ 
mission  of  voice  messages.  In  such 
situations  a  diver  lexicon  may  be  needed 
in  which  communications  will  be  limited 


in  vocabulary  and  syntax.  There  are 
many  examples  of  voice  communication 
being  possible  only  through  the  use  of 
some  standardized  speech  format,  i.e., 
aircraft  flight  clearances,  sound  pow¬ 
ered  phone  talkers*  etc.  Additional  ex¬ 
planations  were  given  by  Dr.  Rothman 
of  the  University  of  Florida's  Research 
Laboratory  toward  developing  (and  de¬ 
scribing)  a  diver's  lexicon.  He  listed 
the  groups  sampled  and  expressed  the 
desire  to  obtain  more  samples  from 
many  more  groups  of  divers,  especially 
those  engaged  in  operational  tasks  of 
salvage,  repair,  exploration  and  recov¬ 
ery.  A  systematized  format  for  voice 
communication  applicable  to  diving  sit¬ 
uations  may  provide  the  redundancy 
necessary  for  adequate  speech  informa¬ 
tion  transmission  until  communication 
systems  achieve  distortion-free  capa¬ 
bilities.  Even  under  ideal  transmission 
conditions  the  standardized  format 
tends  to  help  minimize  semantic  and 
linguistic  confusion  factors. 


Dr.  John  Gill  ventured  a  supposi¬ 
tion  that  the  previously  mentioned 
techniques  may  still  not  solve  all 
divers'  speech  problems.  It  maybe 
that  divers  will  not  be  chosen  just  as 
divers  but  a  selection  made  from  a 
population  of  those  individuals  who  are 
highly-educated  divers  and  who  could 
learn  to  speak  well.  One  parameter  for 
selection  is  to  determine  if  the  diver 
has,  or  can  achieve,  a  low  vowel-con¬ 
sonant  intensity  ratio.  Speakers  who 
exhibit  such  characteristics  tend  to  be 
more  intelligible  over  most  communi¬ 
cation  channels.  This,  of  course,  is  the 
condition  which  engineers  attempt  to 
create  artificially  by  the  process  of 
peak  clipping. 
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Modification  or  Preprocessing  of  the 
Speech  Signal 

Additionally,  Dr.  Gill  stated  that 
when  speech  is  to  be  processed  by 
some  "translator''  device,  the  talker 
who  employs  "more-than-average" 
vocal  force  as  he  talks  should  result  in 
enriching  the  overtone  structure  of  his 
speech  and  allow  the  processing  device 
to  work  on  a  signal  having  high  "infor¬ 
mation"  content,  particularly  if  the  loud 
speech  is  articulated  in  a  very  precise, 
"snappier",  manner. 

Another  preprocessing  method  was 
postulated  in  which  the  diver  would  use 
whispered  speech,  which  is  not  so 
closely  related  to  volume  velocity 
changes,  and  to  amplify  it  strongly 
since  the  vowel-consonant  intensity 
ratios  are  low.  The  dynamic  range  be¬ 
ing  relatively  small  may  allow  selec¬ 
tive  filtering  or  digital  processing  tech¬ 
niques  to  enable  the  whispered  speech 
to  be  "translated"  by  simpler  methods 
than  time  sampling.  Consistantly  pro¬ 
duced  whispered  speech  would  require 
some  training  on  the  part  of  the  diver, 
especially  when  he  experiences  epi¬ 
sodes  of  extreme  stress. 

Other  preprocessing  techniques 
were  mentioned  including  high-fre¬ 
quency  preemphasis  before  translation, 
peak  clipping  and  instantaneous  speech- 
envelope  compression.  All  of  these 
modifications  of  the  speech  signal  have 
been  used  in  airborne  voice  communi¬ 
cation  systems  with  varying  degrees  of 
success.  Their  usefulness  under  heli¬ 
um-oxygen  hyperbaric  conditions  needs 
experimental  verification. 


Communication  Among  Pressure 
Chamber  Subjects 

Subjects  who  are  employed  in  hyper¬ 
baric  chamber  experimentation  or  indoc¬ 
trination  are  reluctant  to  being  restricted 
in  their  movements  by  earphone  and  micro¬ 
phone  cords .  There  is  some  evidence  pro¬ 
vided  by  the  analyses  of  Golden22  and 
McLean5*  that  divers  tend  to  modify 
their  speech  over  a  period  of  time 
under  hyperbaric  conditions  (at  least  to 
205  feet  equivalent  depth).  They  found 
that  over  a  period  of  two  weeks,  speak¬ 
ers,  on  the  average ,  changed  their 
speech  in  such  a  way  that  there  was  a 
shift  downward  in  the  second  and  third 
formants.  The  divers  also  report  they 
were  able  to  understand  each  other  bet¬ 
ter  as  time  progressed  without  resort¬ 
ing  to  microphones  and  translators 
(Sergeant 44re ports  similar  findings). 

This  "adaptation"  phenomenon  needs 
investigation  to  determine  what  is  being 
modified  and  if  auditory  feedback  can 
speed  the  process  by  selectively  elimi¬ 
nating  from  the  feedback  the  factors 
which  would  allow  the  speaker  to  com¬ 
pensate  in  the  "right"  direction.  How¬ 
ever,  some  evidence  as  to  the  possi¬ 
bility  of  adequate  subject  intercommun¬ 
ications  in  hyperbaric  chambers  was 
given  by  Dr.  John  Gill  concerning  the 
British  Admiralty's  1500-foot  dive  in 
which  the  subjects  passed  around  a  high 
quality  microphone  connected  to  a  dis¬ 
tortion-free  amplification  system  which 
fed  a  loudspeaker  in  the  chamber.  He 
reports  "good"  communications,  espe¬ 
cially  if  the  microphone-loudspeaker 
relationship  was  such  as  to  cause  no 
acoustic  feedback.  The  newly  devel¬ 
oped  gradient-type  microphone 
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described  by  Dr.  Charles  Morrow 
should  be  an  aid  in  this  situation. 

Factors  of  Quick  Solutions  to  Oper¬ 
ational  Problems 

This  portion  of  the  discussion  was 
initiated  by  several  operational  Navy 
representatives  who  asked  what  could 
be  done  quickly  to  provide  speech 
communications  for  hard-hat  divers. 
The  need  has  long  been  apparent  and 
they  felt  the  situation  is  desperate. 

The  principal  disturbing  factor  seems 
to  be  a  very  high  noise  level  inside  the 
helmet,  probably  arising  from  the 
valving  and  venting  of  the  breathing 
gases.  Opinions  were  given  that  many 
things  could  be  done,  initially  not  re¬ 
quiring  extensive  research.  Good 
consultation  and  application  of  engin¬ 
eering  techniques  should  result  in  a 
useable  communication  system  and 
improved  communications  environment. 
The  prior  mentioned  gradient  micro¬ 
phone  should  be  tried  as  one  component. 

An  extension  of  the  above  topic  led 
to  several  expressions  in  which  it  was 
felt  that  there  were  communication  gaps 
between  Management,  Operations  and 
the  Scientists.  As  seen  by  Operations, 
both  Management  and  the  Scientists  tend 
to  produce  many  fanciful  solutions  to 
immediate  problems,  liberally  embel¬ 
lished  with  expressions  of  the  difficul¬ 
ties  in  problem  solution.  Several 
representatives  of  Science,  on  the  other 
hand,  lamented  the  situation  in  which 
they  felt  both  Management  and  Opera¬ 
tions  did  not  provide  enough  chances  to 
do  careful  studies  during  operational 
dives.  Whenever  such  chances  did 
occur,  there  seemed  to  be  insufficient 
time  to  obtain  the  requisite  number  of 


speech  samples  or  to  plan  a  satisfac¬ 
tory  experiment.  It  was  suggested  by 
certain  scientists  and  operations  per¬ 
sonnel  that  the  most  expedient  and  per¬ 
haps  satisfactory  solution  to  the  com¬ 
munications  gap  was  to  establish  per¬ 
sonal  contacts  and  to  maintain  and 
foster  such  contacts.  The  "personal 
approach"  would  not  be  an  attempt  to 
circumvent  Management,  since  any 
studies  or  requests  for  aid  must  include 
Management,  but  some  of  the  basic 
needs  and  feasibility  statements  could 
have  preliminary  planning,  or  possible 
sources  of  disagreement,  smoothed. 

Comments  were  made  indicating  that 
potential  contractors  and  Navy  person¬ 
nel  have  been  periodically  confused  be¬ 
cause  of  an  apparent  lack  of  "singleness 
of  purpose”  or  fractionation  of  efforts 
among  various  Navy  organizations  who 
have  (and  have  had)  an  interest  and/or 
responsibility  concerning  underwater 
communications.  It  was  suggested  a 
strong  organization  or  central  office  be 
established  which  would  coordinate, 
assign  projects,  and  provide  the  funds 
for  the  best  possible  solutions  to  prob¬ 
lems  that  the  environment  will  allow. 
The  above  suggestions  were  made  with 
the  realization  that  money  and  priori¬ 
ties  are  an  everchanging  "balance-of- 
power"  interaction,  yet  a  single  coor¬ 
dinator  could  minimize  the  struggle  for 
program  priorities  among  various  Navy 
organizations  who  have,  or  have  devel¬ 
oped,  an  interest  in  underwater  speech 
communications . 

NavOP-23,  the  Supervisor  of  Salvage 
and  Diving,  the  Navy  Experimental 
Diving  Unit,  the  Bureau  of  Medicine  and 
Surgery,  the  Office  of  Naval  Research, 
and  others  have  been  instrumental  in 


57 


initiating  and  maintaining  efforts  direc¬ 
ted  toward  underwater  communications. 
If  delays  in  present  efforts  are  en¬ 
countered,  or  if  the  results  of  present 
efforts  do  not  provide  adequate  sys¬ 
tems,  then  perhaps  a  Task  Unit  should 
be  created  by  NavOP-23,  with  members 
from  appropriate  Naval  organizations,, 
to  remain  in  existence  until  divers  can 
talk  to  each  other  easily  and  reliably. 

At  present,  lacking  such  a  central 
Task  Unit,  a  suggested  procedure  for 
all  Navy  personnel  involved  in  com¬ 
munication  systems  research  and  de¬ 
velopment  is  to  maintain  and  foster 
strong  interpersonal  relationships  with 
their  management,  hoping  to  influence 
them  to  increase  the  number  of  scien¬ 
tific-engineering  discussions  concerned 
with  underwater  communications .  The 
results  of  the  discussions  should  then 
be  fed  back  to  management  as 


rapidly  as  possible.  From  such  dis¬ 
cussion^,  management,  operations  and 
science  may  be  able  to  assign  appropri¬ 
ate  priority  ranking  to  viable  problem 
solutions  with  minimal  delay  which 
hopefully  will  accelerate  operational 
capability. 

In  summary,  the  discussion  in  open 
forum  seemed  to  indicate  that  the  Navy 
should  soon  know  whether  it  will  have 
an  interim  all-diver  speech  communi¬ 
cation  system.  If  the  systems  now  un¬ 
der  test  are  not  adequate,  suggestions 
were  made  as  to  several  "next  steps", 
including:  personnel  selection  and 
training;  speech  preprocessing  meth¬ 
ods;  and  alternative  helium-speech 
translating  methods.  Suggestions  were 
made  by  which  information  feedback  to 
management,  operations  and  science 
could  be  accelerated  and  utilized  in 
planning  and  problem  solution. 
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WORKSHOP  SUMMARY  AND  COMMENTS 
by  W.  Wathen-Dunn 


My  function,  as  I  see  it,  is  not  to 
summarize  in  great  detail  what  each 
speaker  said,  but  rather  to  present  an 
integrated  picture  that  will  tell  us 
where  we  are  now  and  show  in  what 
direction  we  ought  to  go  to  reach  our 
objectives. 

The  problems  we  have  are  three  in 
number.  The  first  is  the  classical 
problem  of  describing  and  explaining 
the  phenomena  with  which  we  have  to 
deal.  The  second  is  that  of  trying  to 
do  something  about  the  limitations  they 
impose,  and  the  third  is  to  evaluate 
the  results  of  our  efforts . 

In  approaching  these  problems ,  I 
cannot  emphasize  too  strongly  the  con¬ 
cept,  advanced  by  several  speakers, 
that  it  is  vital  to  look  at  the  over -all 
system,  not  just  the  components.  It 


is  also  necessary  to  remember  that  the 
system  includes  human  beings  both  as 
talkers  and  listeners  and  that  this  com¬ 
pounds  our  difficulties.  A  purely  physi¬ 
cal  system  exhibits  much  greater  sta¬ 
bility  than  one  that  includes  people. 

A  diagram  of  the  over-all  system  is 
given  in  Fig.  16.  First,  there  is  the 
speech  generation  process,  and  we  need 
to  know  what  the  normal  process  is  and 
the  ways  in  which  it  is  modified  by  its 
operating  in  a  different  pressure,  hav¬ 
ing  a  different  gas  and  being  loaded 
differently.  These  are  the  effects  of 
environment,  which  may  also  add 
noise  and/or  reverberation.  The  sig¬ 
nal,  thus  modified  and  degraded,  is 
converted  to  electrical  form  by  a  trans¬ 
ducer  which  feeds  a  transmission  sys¬ 
tem  about  which  I  shall  not  talk.  At 
the  receiving  end,  there  is  another 


Normal  Effects  of  Frequency 

Gas,  Range 

Pressure, 

Load,  Noise, 

Reverberation 


Fig.  16.  (Wathen-Dunn)  Diagram  of  “ The  Over-all  System”  for  any  voice  communication. 
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transducer  to  convert  the  signal  to  audi¬ 
ble  sound  and  then,  finally,  the  speech 
perception  process. 

We  make  a  fundamental  assumption, 
though  no  one  here  has  stated  it  ex¬ 
plicitly,  that,  if  you  can  convert  the 
short-term  speech  spectrum  into  some 
semblence  of  the  form  it  would  have 
had  if  the  speaker  had  been  talking  in 
a  normal. environment,  the  result  will 
be  perceptually  acceptable  to  the 
listener.  On  this  assumption,  we 
shall  have  to  insert  somewhere  in  the 
system,  either  before  or  after  trans¬ 
mission,  a  device  whose  job  it  is  to 
get  the  spectrum  back  into  shape. 


To  understand  what  is  required  of 
this  device,  we  must  understand  the 
speech  generation  process  itself. 


Great  insights  have  been  gained  by  con¬ 
sidering  this  to  be  a  problem  of  network 
analysis  and  using  methods  developed 
by  the  electrical  engineer  to  deal  with  it. 
He  describes  the  process  in  terms  of 
excitation  functions  that  are  applied  to  a 
,fblaek  box",  as  shown  in  Fig.  17.  For 
our  purposes ,  we  need  only  recognize 
that  speech  requires  a  source  of  sound 
and  that  this  is  the  excitation.  There 
are  two  such  sources:  (1)  voicing, 
produced  by  drawing  the  vocal  cords 
together  and  forcing  air  to  flow  be¬ 
tween  them  to  make  them  vibrate;  and/ 
or  (2)  noise,  produced  by  constricting 
the  vocal  tract  at  some  point  and  forc¬ 
ing  air  through  the  constriction  or,  in 
some  cases,  closing  the  tract,  building 
up  air  pressure  behind  the  closure  and 
then  releasing  it. 

These  excitations  are  time  varying 
functions ,  and  there  is  inevitably 


EXCITATION 
( VOICING /NO  tSE) 


"BLACK  BOX" 

(t) — * 

VOCAL  TRACT 

(IMPULSE  RESPONSE) 

A 

•* 

/ 

* 

V 

*  \ 

t 

Fjlw)  •  H(w)  =  F0{w) 


SPEECH 


SOURCE  SPECTRUM  •  SYSTEM  FUNCTION  =  SPEECH  SPECTRUM 


(VOICED  SOURCE) 

1111^ 


SOUND  SOURCE 


*  OIRECT  (*)  AND  INVERSE  (t  )  FOURIER  TRANSFORMS 


Fig.  17.  (Wa  then- Dunn)  The  concept  of  excitation  functions  and  the  “ black  box”  for  the  speech  generation 

process. 
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associated  with  each  one  a  spectrum. 
Frequencies  are  generated  in  the  audi¬ 
ble  range,  and  there  are  mathematical 
processes  that  can  transform  the  time 
functions  into  functions  of  frequency. 

The  simplest  is  the  Fourier  transform, 
and  with  it  we  can  express  all  the  fre¬ 
quencies  in  the  excitation. 

What  are  these  frequencies  ?  For 
voicing,  the  spectrum  consists  of  a 
fundamental  and  a  series  of  harmonics 
whose  amplitudes  fall  off  with  frequency, 
as  John  Gill  pointed  out.  The  rate  of 
fall-off  is  a  function  of  vocal  effort.  It 
is  more  rapid  for  weak  speech  than  for 
strong,  and  Gunnar  Fant  assumes  it  to 
be  about  12  dB/octave  for  normal, 
conversational  speech.  The  spectrum 
of  the  noise  produced  by  turbulence  in  a 
constriction  can  be  taken  to  be  fairly 
uniform  with  frequency. 

These  spectra  are  applied  to  the 
vocal  tract,  which  plays  the  role  of  the 
"black  box.  "  The  vocal  tract  is  a 
passive  thing.  It  does  not  generate 
sound,  but  it  can  modify  the  sounds 
presented  to  it.  Its  transmission  prop¬ 
erties,  characterized  by  a  system 
function,  attenuate  some  frequencies 
while  reinforcing  others  and  impose 
phase  delays  that  likewise  vary  with 
frequency.  In  other  words,  the  vocal 
tract  affects  the  amplitude  and  phase  of 
every  frequency  in  the  excitation.  If  we 
multiply  the  frequency  function  of  the  ex¬ 
citation  by  the  system  function  of  the  vocal 
tract,  we  get  the  spectrum  of  the  speech 
output,  which  may  be  converted  to  the  out¬ 
put  time  waveform  by  an  inverse  Fourier 
transform.  In  doing  this,  radiation  ef¬ 
fects  can  be  lumped  witheither  the  exci¬ 
tation  spectrum  or  the  system  function. 


That  is  the  theory,  but  it  might  be 
most  useful  for  us  to  think  of  the  vocal 
tract  as  an  acoustical  transmission 
system  that,  like  an  organ  pipe,  has 
resonances.  These  resonances  are  called 
formants.  They  evince  themselves  as 
peaks,  or  humps,  in  the  spectrum  — 
places  where  excitation  frequencies  are 
reinforced.  The  result  for  a  typical 
vowel  is  shown  in  Fig.  17. 

Lastly,  not  only  is  the  vocal  tract 
passive,  but  its  system  function  changes 
continuously  as  we  move  from  one  ar¬ 
ticulatory  configuration  to  another. 
Incidentally,  our  perception  of  speech 
includes  the  perception  of  articulation. 

We  can  mimic.  If  we  hear  a  speech 
sound,  we  can  manipulate  our  whole 
articulatory  apparatus  so  as  to  repeat 
it,  because  knowledge  of  the  articula¬ 
tory  position  is  implicit  in  our  percep¬ 
tion  of  the  sound. 

The  foregoing  describes  the  normal 
speech  generation  process.  What  mod¬ 
ifications  are  caused  by  a  different  gas, 
pressure  and  loading?  First,  we  are 
all  aware  that  helium  in  the  gas  causes 
the  formant  frequencies  to  shift  upward 
by  reason  of  the  increase  in  sound  ve¬ 
locity,  but  it  is  important  to  note  that 
this  places  these  passive  resonances  in 
positions  where  there  is  less  energy  to 
excite  them,  because  the  excitation 
spectrum  falls  off  with  frequency.  I 
assume  it  falls  off  in  much  the  same 
way  that  it  does  for  air,  though  I  know 
of  no  one  who  can  assure  me  on  this 
point.  Certainly,  the  fundamental  fre¬ 
quency  of  vibration  of  the  vocal  cords 
is  unchanged  by  the  introduction  of 
large  amounts  of  helium  at  normal  at¬ 
mospheric  pressure. 
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For  high  ambient  pressures,  we 
know  that  the  minimum  frequency  of  the 
first  formant  is  raised  so  that  there  is 
no  longer  a  linear  relation  between  the 
value  of  any  formant  for  air  at  sea  level 
and  its  value  for  air  at  depth.  Jan 
Lindqvist  made  the  important  point  that 
the  effect  of  this  is  reduced  in  helium 
speech,  because  in  that  case  all  the 
formant  frequencies  are  raised.  To 
the  best  of  my  knowledge,  the  effects 
of  pressure  on  the  excitation  spectrum 
are  unknown,  though  they  ought  to  be, 
because  significant  changes  in  this 
spectrum,  caused  either  by  gas  or 
pressure,  could  have  important  impli¬ 
cations  for  how  you  restore  speech  to 
normalcy.  Some  sort  of  amplitude 
compensation  as  a  function  of  frequency 
may  be  necessary. 

You  might  also  want  to  introduce 
another  kind  of  amplitude  compensa¬ 
tion.  With  increased  pressure,  the 
amplitude  of  the  voicing  excitation  in¬ 
creases  as  the  square  root  of  the  pres¬ 
sure,  whereas  that  of  the  noise  re¬ 
mains  unchanged.  9,  U  This  makes 
voiced  speech  sounds  louder  than  un¬ 
voiced  ones ,  which  might  lead  you  to 
process  the  speech  in  one  or  the  other 
of  two  different  ways,  depending  on  the 
voice -voiceless  distinction. 

A  related  question  was  raised  in  the 
discussion,  namely,  whether  it  is  pos¬ 
sible  for  a  speaker  to  compensate  by 
some  learned  modification  in  the  way 
he  talks.  One  suggestion  would  be  to 
feed  him  sidetone  from  the  output  of  the 
unscrambler,  and  from  this  he  might 
learn  to  articulate  in  such  a  way  as  to 
improve  the  received  intelligibility. 

This  assumes  that  pressure  has  no  ap¬ 
preciable  effect  on  the  motion  of  the 
articulators . 


The  external  environment  may  con¬ 
sist  of  masks  or  helmets  which,  with 
their  gas  regulatory  mechanisms,  in¬ 
troduce  several  problems.  A  mask 
reacts  with  the  vocal  tract  to  distort 
the  resonant  system,  and  it  may  also 
impede  the  flow  of  exhaled  gas  so  that 
voicing  is  affected.  Gas  supply  and 
discharge  mechanisms  create  noise  that 
ought  to  be  reduced  as  much  as  possi¬ 
ble,  for  it  further  complicates  the  spec¬ 
trum  restoration  process.  Helmets 
have  a  serious  resonance  in  an  impor¬ 
tant  part  of  the  speech  spectrum.  A 
re-design  of  the  interior  shape  of  diving 
helmets  and  the  introduction  of  some 
damping  would  be  beneficial. 

For  helium  speech  it  is  necessary  to 
have  a  wide -frequency-range  system 
ahead  of  the  unscrambler,  but  it  seems 
that  only  recently  has  this  been  imple¬ 
mented.  The  microphone,  of  course,  is 
the  first  component  in  such  a  system, 
and  I  gather  that  Dr.  Morrow’s  micro¬ 
phone  fills  the  bill.  Provision  of  a  re¬ 
sponse  that  rises  with  frequency,  either 
in  the  microphone  or  in  the  associated 
pre -amplifier,  can  be  used  to  compen¬ 
sate,  at  least  in  part,  for  the  lower 
formant  amplitudes  in  helium  speech 
caused  by  the  falling  source  spectrum. 

Yesterday.  Craig  Allen  indicated 
three  different  methods  for  unscram¬ 
bling  helium  speech:  (1)  heterodyning; 
(2)  "vocodering";  and  (3)  time-domain 
processing.  The  Navy  Applied  Sciences 
Lab  unscrambler  in  the  early  60’s  used 
the  heterodyne  method.  The  signal  was 
broken  up  into  two  or  three  bands  that 
were  shifted  downward  by  differing 
amounts  to  make  ,  them  occupy  roughly 
the  proper  spectral  region.  One  diffi¬ 
culty  with  this  method  is  that  it  does 
not  necessarily  preserve  the  harmonic 
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structure  of  the  voicing,  and  this  is 
a  source  of  distortion. 

The  Stockholm  delegation  gave  a 
good  resume  of  vocoder  techniques.  A 
vocoder  analyzes  the  input  speech  spec¬ 
trum,  dividing  it  into  contiguous  sub¬ 
bands  with  a  set  of  bandpass  filters, 
after  which  it  takes  a  running  measure 
of  the  acoustical  energy  in  each  band. 

At  the  receiver,  it  synthesizes  a  rep¬ 
lica  of  the  original  speech  by  applying 
appropriate  excitation  to  another  set  of 
contiguous  bandpass  filters,  modulating 
the  output  of  each  one  according  to  the 
running  measure  for  that  filter  from  the 
analyzer  and  summing  the  outputs. 
Ordinarily,  the  two  sets  of  filters  are 
identical,  but  the  essential  feature  for 
our  purposes  is  that  the  synthesizing 
filters  can  be  made  to  occupy  a  lower 
and  narrower  frequency  range  and  thus 
correct  for  helium  distortions.  The 
arrangement  is  inflexible,  though,  un¬ 
less  you  can  make  the  filters  tunable  in 
the  way  Mr.  Allen  suggested. 

Most,  if  not  all,  of  the  unscramblers 
we  listened  to  yesterday  work  in  the 
time  domain.  A  segment  of  normal 
voiced  speech  has  a  waveform  like  that 


Fig.  18.  (Wathen-Dunn)  Acoustic  waveforms  of 

speech,  a)  Normal  voice,  b)  Helium-speech. 


shown  in  Fig.  18a.  The  principle  peaks 
are  caused  by  voice  pulses,  and  the 
time  between  them  is  the  fundamental 
period.  The  lesser  vibrations  in  be¬ 
tween  result  from  the  formants'  ring¬ 
ing.  A  segment  of  voiced  helium  speech 
is  illustrated  in  Fig.  18b.  Here  the 
formant  vibrations  occur  much  more 
rapidly  and  therefore  crowd  to  the  left, 
though  the  fundamental  period  remains 
unchanged.  If  you  chop  off  the  tail  of 
the  r  inging  at  a  suitable  point  and  ex¬ 
pand  what  is  left  to  occupy  the  full 
voicing  period,  this  stretches  the  wave¬ 
form  into  some  semblence  of  its  orig¬ 
inal  shape.  This  has  to  be  done  on  a 
pitch  synchronous  basis,  but  the  amount 
chopped  off  can  be  varied  continuously, 
which  provides  flexibility. 

Digital  processing  offers  other 
means  for  dealing  with  helium  speech. 
One  is  homomorphic  processing,  which 
includes  Cepstral  processing.  Recall 
(from  Fig.  .17)  that  the  product  F(  (w)* 
H(«)  =  Fo(w),  the  output  speech  spec¬ 
trum.  If  we  take  the  log  of  both  sides 
of  this  equation,  the  left-hand  side  be¬ 
comes  an  addition,  log  Fj(u>)  +  log  H(w), 
which  suggests  the  possibility  that  suit¬ 
able  processing  of  log  F0(«)  could 
separate  the  excitation-  and  system- 
function  components  in  some  sense. 

This  turns  out  to  be  partially  true .  If 
we  transform  log  F0(«)  back  into  a 
time  domain,  t1,  we  get  what  is  called 
a  "cepstrum."  Concentrated  about  the 
origin  is  a  function  that  represents  the 
combination  of  source  spectrum  and 
vocal  tract  transfer  characteristic. 
Further  out  is  a  series  of  peaks  that 
are  spaced  from  the  origin  and  from 
each  other  by  the  fundamental  period, 
and  these  are  due  to  voicing.  If  we 


63 


truncate  this  function  so  as  to  eliminate 
all  the  voicing  peaks,  the  remaining 
portion  adjacent  to  the  origin  can  be 
transformed  back  into  the  frequency 
domain  to  give  a  smooth  curve  that 
represents  the  spectral  effects  of 
transfer  characteristic  and  source 
spectrum. 

Frank  Quick  worked  with  this 
curve.37,38  He  squeezed  it  to  the  left  to 
give  the  formants  their  proper  posi¬ 
tions,  exponentiated  it  to  remove  the 
effects  of  logging,  and  then  dealt  with 
the  result  as  though  it  were  simply  a 
system  function.  He  transformed  this 
into  the  time  domain  to  give  an  impulse 
response  which  he  convolved  with  either 
regular  or  irregular  pulses  to  recon¬ 
struct  the  speech.  The  output  was  far 
more  intelligible  than  the  original  and 
might  have  been  better  if  he  had  had  a 
better  tape  of  helium  speech  to  start 
with. 

A  different  method,  based  on  pre¬ 
dictive  coding,  has  been  developed  by 
Bishnu  Atal  at  Bell  Labs.  I  shan’t  at¬ 
tempt  to  explain  it,  but  it  appears  to 
yield  a  simple  algorithm  for  getting  the 
inverse  filtering  characteristic  of  the 
vocal  tract,  which  is  what  we  want  to 
manipulate.  Its  simplicity  allows  it  to 
be  done  in  less  computational  time,  and 
this  gets  to  the  heart  of  the  matter. 
Digital  processing  is  useful  only  if  you 
can  build  dedicated  computers  that  are 
fast  enough  to  do  the  required  process¬ 
ing  in  real  time  and  that  are  small  and 
cheap  enough  to  make  it  practical. 
Computers  that  fulfill  these  require¬ 
ments  are  not  available  at  the  moment, 
but  the  state-of-the-art  seems  to  be 
progressing  rapidly  in  that  direction. 


Meanwhile,  these  processing  methods 
are  a  powerful  research  tool. 

Lastly,  we  come  to  evaluation.  I 
think  there  are  three  aspects  of  the 
speech  signal  that  are  important  to  pre¬ 
serve  in  a  communications  system. 

The  first,  of  course,  is  intelligibility, 
but  a  second  is  talker  identity,  and  a 
third  is  what  I  call  the  emotional  con¬ 
tent  of  the  speech.  These  convey  what 
was  said,  who  said  it  and  how  it  was 
said.  In  helium  speech  we  seem  to  be 
happy  if  we  can  preserve  intelligibility, 
and  we  haven't  worried  about  the  other 
aspects.  There  are  tests  for  evaluating 
a  system  for  intelligibility,  but  it  must 
be  kept  in  mind  that  they  do  not  yield 
really  absolute  results .  The  best  you 
can  do  is  rank-order  several  systems 
using  the  same  crew  and  test  conditions 
at  the  same  time,  arid  even  then  a  2% 
difference  is  not  very  meaningful.  For 
testing  unscramblers,  Cdr,  Bloom 
made  the  valid  point  that  the  talkers 
and  listeners  ought  very  likely  to  be 
divers. 

There  is  just  one  more  point.  Closed 
message  sets  were  discussed,  and  they 
might  possibly  increase  intelligibility. 
However,  from  the  point  of  view  of  in¬ 
formation  theory,  it  is  the  unexpected 
message  —  the  least  probable  one,  the 
emergency  one  —  that  carries  the 
greatest  amount  of  information,  and 
this  ought  to  be  considered  in  construct¬ 
ing  any  closed  message  set. 
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